anastigmatix.net

This document has a standard, validated CSS2 stylesheet, which your browser does not seem to display properly. In a browser supporting web standards, this table of contents would be fixed at the side of the page for easy reference.

anastigmatix home
  • Minimalist markup for PostScript
  • What Markup is not
  • What Markup is
  • What Markup does differently
  • Markup's only rules
  • Markup: reference
  • Markup dictionary contents
  • Markup proper
  • Parameter dictionary keys
  • Utility procedures
  • includegen
  • glyphwidth
  • gr-path
  • sgs-path
  • Predefined Markup configurations
  • Dump
  • Basic
  • Basic: reference
  • Basic in one paragraph
  • The details underneath
  • Baselines
  • RagRight
    Center
    RagLeft
  • Track
  • Place
  • Width
  • ResultHandlers
  • Before
    After
  • Examples
  • Markup: minimalist markup for the PostScript® language

    Markup is a procedure-set resource for the PostScript language that changes the form of input to free lines of text that can be interrupted by fragments of PostScript programming. The obvious application is text formatting, but Markup can be adapted to many jobs that involve reading material line-by-line.

    What Markup (by itself) is not

    Quite sophisticated resources for in-PostScript text formatting exist, as can be seen in my direct PostScript resources survey. They offer full justification, tables, columns, and many other capabilities associated with complete text typesetting systems. Those systems run 40 to 200 kilobytes and more of interpreter memory, and Markup, at about eight, is not intended to replace but to complement them. It has no preordained idea what to do with the lines it reads, but can be linked to the procedures of a sophisticated typesetting resource to drive it with a convenient form of input. It works well in this capacity with the TinyDict, which does not have free-text input provisions of its own.

    Thumbnail image of a business letter formatted with Markup Basic

    What Markup is (even by itself)

    Markup does include simple provisions, usable without any larger typesetting library, for simple line-for-line setting of text—that is, producing one set line from each input line, without filling words to fit—ragged right, ragged left, or centered, and Markup's especially simple relationship to the underlying PostScript language means those standalone facilities are versatile enough for everyday business correspondence, promotional flyers, labeling of figures, and other jobs that do not demand the greater automation the elaborate systems provide. There is no plan to add significantly to these built-in capabilities, as Markup is meant always to be lightweight enough to be an attractive front end for other libraries and, by not being tied to any one in particular, to stimulate use and development of new and existing libraries built on it or usable with it. Markup's standalone capabilities are meant to be adequate for a range of simple tasks but never to become the main point.

    What Markup does differently

    Markup is intended to stake out a distinctive position in the relationship of the markup language to the underlying PostScript. Some of the typesetting libraries I have surveyed introduce markup codes with an all new look and new rules for scanning and syntax, new mechanisms for defining commands or selecting fonts, and so on. Markup strives to avoid reinventing anything that PostScript already does easily and well, and to behave as nearly as possible as a natural outgrowth of PostScript. This example is written for Markup with its own built-in Basic formatter:

    Markup may offer no dedicated new codes for, for example, font switching
    but will certainly accept \{/Times-Italic 12 selectfont}the appropriate
    ordinary PostScript\{/Times 12 selectfont} anywhere in the input. The effect
    depends on the back-end typesetting library in use, but if it follows the
    same philosophy of transparency, as Markup's Basic certainly does, this will
    do just what you expect.
    

    If there are only a few font changes in a one-off document, it would be hard to beat that form for clarity: there is no burden of learning or remembering new commands for font selection, or which fonts have been assigned to them. In a longer document, or where a consistent style is important, it will make sense to define some compact abbreviations. Now style changes can be made in one place. But that doesn't get any easier than PostScript already makes it:

    \{
      /ro {currentfont /Times-Roman 12 selectfont} bind def
      /it {currentfont /Times-Italic 12 selectfont} bind def
      /last {setfont} bind def
    }\ro This text is in Roman; \it this is emphasized, but to
    \ro really \last emphasize something that's already in italics, one
    sometimes goes back to Roman. \last It is easy to set up
    a \/quoteleft last \/quoteright  command when the stack behaves as expected.
    
    I like using the standard PostScript glyph names for the quotes rather than
    remembering to write \<60>last\<27> in StandardEncoding,
    or \<91>last\<92> in CE encoding\/mdash but that's a matter of
    preference.
    
    Thumbnail image of a Markup Basic sampler

    Markup's only rules

    Markup's clear family resemblance to PostScript is a result of its PostScript-like scanning rules, which result from its use of PostScript's own scanning operators, with rules as simple as can be:

    To express those rules in PostScript took even fewer words than to describe them here. A few natural consequences are worth mentioning:

    Markup: reference

    Markup is a ProcSet resource. To make it available to your own code, include in the setup section of your file:

    /net.anastigmatix.Markup /ProcSet findresource begin
    

    The findresource will succeed if you have made the Markup resource file [download] available in any of these ways:

    Markup relies on another resource, MetaPre (the eight kilobyte memory figure given earlier is the total for both). You will need that file also. If you use the first method, you should include both files in the prolog of your document, MetaPre first. The other methods should Just Work as long as both files are where they need to be. In any case, your document only needs the single findresource line shown above. A findresource for MetaPre is not needed unless you also use MetaPre features in your own document. You pay no penalty to do so, as the resource must be there anyway.

    The resource files are in a compact form. That is for efficiency, not to keep you from viewing them; there is a script for that on the resource packaging page.

    The Markup dictionary is read-only. Before creating any definitions, you will want either
    userdict begin or your own dict begin so that you have a writable dictionary on top of the dictionary stack.

    Markup dictionary contents

    This section describes the contents of the read-only dictionary that is returned by /net.anastigmatix.Markup /ProcSet findresource.

    Markup proper

    The dictionary contains one definition that implements Markup itself, reading and processing free text input as described in the introduction.

    \markup
    dict \markup -

    \markup configures itself according to the supplied parameter dictionary dict and begins reading and processing input until it reaches end of input, or the PostScript operator stop is executed. It reads from the file given as SourceFile in the parameter dictionary, supplying a definition based on currentfile if the entry is not present. If currentfile is the source, ordinary PostScript interpretation of the file resumes after a stop is executed.

    Lines are read by readline. If readline reads a complete line (terminated by newline), the line is a FullLine if it has nonzero length, a BareLine otherwise, and the corresponding procedure is executed with the line on the stack.

    If readline does not read a complete line, either the end of input has been reached, or an interruption has been reached, defined by the values EODCount and EODString. The PartLine procedure is executed with the partial line read on the stack, and then the token operator is used to read a single token from SourceFile.

    If token succeeds, the procedure HandleToken is executed with the token on the stack. If HandleToken returns (without executing stop), the procedure HandleResult is executed. If that also returns without executing stop, \markup freshly fetches SourceFile, EODCount, and EODString from the parameter dictionary in case their values were changed in handling the token, and resumes reading lines from the (possibly changed) SourceFile.

    If token returns false, the end of SourceFile has been reached. If no EOFToken entry is present in the dictionary, \markup completes. If an EOFToken is present, \markup executes HandleToken and HandleResult just as if that token had been read and, if stop has not been executed, freshly fetches SourceFile, EODCount, and EODString, and resumes reading from the (presumably changed) SourceFile.

    The parameter dictionary passed to \markup may be crafted for some underlying library for typesetting (or even some other purpose), to cause \markup to execute the appropriate procedures of that library. The dictionary may contain additional entries controlling the interface to that library and of no interest to \markup. The entries that are meaningful to \markup are described here:

    KeyPurpose
    BareLine Procedure to be executed when readline has returned true, meaning a newline was encountered, and the line read was of zero length. This can happen for an actual bare line in the input, or for the final readline of a line that ends with a self-delimiting PostScript token. For consistency with FullLine and PartLine, the stack contains (a) whatever was on it when \markup was invoked (except for the parameter dictionary), (b) as modified by any embedded PostScript tokens since, and not consumed by HandleToken or HandleResult, and (c) on top, the string just read, though in this case it is known to be empty; in some uses (simple line-by-line setting), it can make sense for BareLine and FullLine to be the same procedure. When used with paragraph-at-a-time systems, BareLine will usually be defined to invoke the library procedure for filling and setting the completed paragraph. \markup fetches this value only once, so changing it from embedded PostScript code has no effect, but nothing stops the supplied procedure looking at other keys and having changeable behavior.
    EODCount An integer that, in combination with EODString, determines how embedded-PostScript “interruptions” are recognized. Determines how many instances of EODString can be encountered in the input before reading is interrupted; overlapping instances are not multiply counted. If the value is zero, the first occurrence of EODString interrupts reading and is not read as part of the text. If this entry is not present in the dictionary, \markup adds a definition of zero. If EODString is empty, reading will be interrupted when exactly EODCount bytes have been read; this combination is not likely to have practical use in \markup. This value is freshly fetched from the dictionary after any embedded token has been handled, so embedded PostScript tokens may change it.
    EODString A string that, in combination with EODCount, determines how embedded-PostScript “interruptions” are recognized. See EODCount. If this entry is not present in the dictionary, \markup adds a definition of (\\). This value is freshly fetched from the dictionary after any embedded token has been handled, so embedded PostScript tokens may change it.
    EOFToken
    (optional)

    When the end of SourceFile has been reached, ordinarily \markup completes. If this entry is present, its value is treated just as if it had been returned by token at the end of the file; HandleToken and HandleResult are executed, and if stop was not executed, reading resumes from the (presumably changed) SourceFile. This is the mechanism by which a file-inclusion operation can be supplied. The idea is for the file-include procedure to store the new file in SourceFile and supply an EOFToken that restores the old one (and the old EOFToken). Such a procedure can be generated by includegen, which is used to provide the file inclusion feature in Basic.

    If there is an EOFToken and it does not either replace SourceFile, replace EOFToken, or execute stop, \markup will reach end of file and spin.

    FullLine Procedure to be executed when readline has returned true, meaning a newline was encountered, and the line read was of nonzero length. This can represent an entire full line in the input, or the final segment of a line after an embedded PostScript token. The stack contains (a) whatever was on it when \markup was invoked (except for the parameter dictionary), (b) as modified by any embedded PostScript tokens since, and not consumed by HandleToken or HandleResult, and (c) on top, the string just read. In driving a line-at-a-time text setting system, this procedure may invoke PartLine and then a library procedure to set the complete accumulated line; for a paragraph-at-a-time system, this procedure and PartLine will probably be the same. \markup fetches this value only once, so changing it from embedded PostScript code has no effect, but nothing stops the supplied procedure looking at other keys and having changeable behavior.
    HandleResult Procedure to be executed when an embedded token has been read, after HandleToken has been executed, and if HandleToken did not incur a stop. Anticipated uses may check the number and type of objects on the stack, and perhaps add a string on top of stack to the accumulating line, interpret a literal name as a glyph name to be shown, or other conventions appropriate to the application.
    HandleToken Procedure to be executed when an embedded token has been read; any results on the stack may then be processed by HandleResult. One reasonable implementation of HandleToken is a simple unconditional exec. At the time this procedure is executed, SourceFile contains the file object from which the token was read.
    LineBuffer A string that will be used as the buffer for readline. The size of this string determines the maximum line length that can be found in the input without incurring a rangecheck.
    PartLine Procedure to be executed when readline has returned false, meaning it encountered the end of input (or an interruption for embedded PostScript) before a newline. The line may be of zero or nonzero length. The stack contains (a) whatever was on it when \markup was invoked (except for the parameter dictionary), (b) as modified by any embedded PostScript tokens since, and not consumed by HandleToken or HandleResult, and (c) on top, the string just read. \markup fetches this value only once, so changing it from embedded PostScript code has no effect, but nothing stops the supplied procedure looking at other keys and having changeable behavior.
    SourceFile File object from which \markup reads input. If this entry is not initially present, \markup defines it with the file object obtained from currentfile. This value is freshly fetched from the dictionary after any embedded token has been handled, so embedded PostScript tokens may change it.

    Utility procedures

    The resource dictionary includes four utility procedures likely to be useful to other code.

    includegen
    dict includegen proc

    Returns a procedure proc that will implement an “include” facility by manipulating the SourceFile and EOFToken entries in a markup dictionary dict. The returned proc has stack effect (file proc –). It will install file as SourceFile, and install an EOFToken that will restore the previous SourceFile and EOFToken, if any, when it is invoked by \markup at the end of the included file.

    glyphwidth
    name glyphwidth wx wy

    Returns the change in current point that would result from executing name glyphshow. I used to wonder why PostScript had both show and glyphshow, but only stringwidth and no glyphwidth. I got tired of wondering.

    gr-path
    gr-path

    Read “grestore minus path.” Replaces the current graphics state using all values saved by the matching gsave except the current path, which is preserved. Any current path that was saved by gsave is lost.

    sgs-path
    gstate sgs-path

    Read “setgstate minus path.” Replaces the current graphics state using all values saved in gstate except the current path, which is preserved. Any current path that was saved in gstate is not lost—it is still in gstate—but does not become the current path.

    Predefined Markup configurations

    The resource dictionary includes prototypes for parameter dictionaries giving two usable configurations of \markup.

    Dump

    This is a parameter dictionary that simply supplies values for PartLine, FullLine, BareLine, HandleToken and HandleResult that will cause \markup to dump to standard output the same input that it reads (PostScript tokens will be as formatted with ==). A 192-byte default LineBuffer is provided. The dictionary is read-only; you should allocate a new dictionary and copy the contents so automatically-added contents like SourceFile, EODCount, and EODString can be added.

    Basic
    num Basic dict

    This is a procedure to produce a parameter dictionary configuring \markup as the simple, line-for-line Basic text formatter described in the introduction. The dictionary is initially configured to set ragged-right with an advance of num units downward between consecutive baselines, though both can be changed in the resulting dictionary before or during use. “Right” and “down” are taken, as in PostScript default coordinates, to be increasing x and decreasing y, respectively. A 192-byte default LineBuffer is provided.The details of the Basic formatter are described in the next section.

    Basic: reference

    Basic is a minimal, line-for-line text formatter that can be driven by Markup. Its versatility stems from its very close relationship to the PostScript language beneath; what happens when arbitrary PostScript sequences are embedded in text being set should (and often does) match what you might expect without thinking too hard.

    Basic in one paragraph

    Basic requires an initial current point to be set, as with moveto. Each line of text is placed with its reference point at the current point: in RagRight mode, the left end of the line is placed there, in RagLeft mode, the right end, and in Center mode, the middle, and the original current point is moved by the Baselines advance. These are special cases for the values of two matrices that completely control line placement. And with that, the operation of Basic is almost completely described—nothing remains but details.

    The details underneath

    Basic is implemented as a parameter dictionary for \markup. An instance of the dictionary is generated by the procedure Basic as described in the previous section. The dictionary has several entries in addition to those required by \markup itself. Some of these entries are procedures that can be used to change settings from embedded PostScript code. To use them that way, it is convenient to put the parameter dictionary on the lookup stack, something \markup does not automatically do:

    14 Basic dup begin userdict begin \markup

    The userdict begin merely arranges that any definitions created in your embedded PostScript do not wind up in the parameter dictionary itself. Be sure to use store and not def when it is your intent to change values in the parameter dictionary: Basic looks only there for its own state.

    The Basic parameter dictionary defines \markup's required keys BareLine, PartLine, FullLine, and HandleResult all in terms of each other and Basic's two fundamental operations, Track and Place. Track and Place can be used directly from embedded PostScript for particular effects. Consider two similar lines of input:

    \{(M)stringwidth rmoveto}This is a line of text.
    \{(M)stringwidth{rmoveto}Track}This is another line of text.
    

    The first example simply moves the current point one em to the right, essentially behind Basic's back. Because Basic simply places all text according to the current point and advances it with relative moves, the effect is a one-em indent for the current and all following lines, until changed by another rmoveto. You would do this to indent a block of text, or to pick up and move to an entirely new area of the page.

    The second example makes the rmoveto into a tracked element of the current line, which will be compensated when the line is placed and the current point is advanced. The one-em indent will last only for one line. You would use this for a paragraph indent (probably by supplying a definition of BareLine that used it, so the indent would be applied for each bare line in the input).

    An untracked move will have the same effect no matter where in a line it appears—it is simply a change to the current point made behind Basic's back while the line is being assembled and before it is placed. Tracked moves can appear anywhere within lines, and do just what you would think.

    The entries specific to Basic in the parameter dictionary are these:

    Baselines
    num Baselines -

    Set the baseline advance to num (the initial value was given by the num argument to Basic when the dictionary was generated). A positive value advances downward (decreasing y). Modifies the After matrix.

    RagRight
    Center
    RagLeft
    - RagRight -
    - Center -
    - RagLeft -

    Change the line placement mode (RagRight is the default when the dictionary is generated) by modifying the Before and After matrices.

    Track
    proc Track -

    The first part of Basic's internal mechanism for text placement. Track is used to build a queue of procs representing segments of a line. It takes a snapshot of the current graphics state and operand stack, executes proc without marking the page, finds a distance vector from the current point before executing proc to the current point after, restores the graphics state from the snapshot, updates a cumulative distance vector for the line, and adds proc and the snapshot to the queue. The cumulative distance vector is maintained in device space to be independent of changes to user space.

    If proc executes stop, Track does not update the distance vector or add to the queue. This is one way a line-filling formatter could be built on top of Basic: Track each candidate element wrapped in a suitable procedure that will stop if it cannot fit on the current line. The origin is the initial current point when Track executes proc, so following the graphics operations proc can obtain its own distance vector (in user units) with a simple currentpoint. This is one application that would justify designing proc to behave differently when executed by Track and when later executed by Place—a bad idea, in more ordinary circumstances.

    Place
    after before Place -

    The second part of the text placement process. Before executing the queued procs that make up the line, Place transforms the line's cumulative distance vector to user space and then through the matrix before to obtain arguments for an rmoveto. For each proc on the queue, Place then imposes its saved operand stack and graphics state, using sgs-path so the current point is not altered, executes proc, this time marking the page, and cleans the operand stack. After executing the last proc, Place restores the original operand stack and again transforms the line's distance vector, this time using the after matrix to find an rmoveto that reaches the next line's reference point, and resets the distance vector.

    From the description of Track and Place it should be clear that every proc representing a segment of the line is executed twice, first by Track and later, with the same operand stack, by Place. The necessary assumption is that the procs do not depend on side effects or other state than their operands that could change their behavior from one execution to the next. If a proc violates that assumption, surprises may result. Absolute moves are not recommended because the current point is unlikely to be the same both times proc is executed.

    Width
    Width wx wy

    Returns the current accumulated width vector of all that has been Tracked and not yet Placed, that is, the vector from the start point of the first item Tracked since the last Place to the end point of the most recent, in units of the user space in effect when Width is executed. This could be used in conjunction with conditional Tracking to implement a line-filling formatter as suggested in the description of Track.

    ResultHandlers

    This entry is a dictionary mapping object type names to procedures, used to customize the behavior of HandleResult after an embedded PostScript token has been executed. If there is at least one item on the stack, and the type of the topmost item is a key in ResultHandlers, then the associated procedure is executed. Three types are initially present: stringtype maps to the procedure PartLine, nametype to the procedure {{glyphshow}Track}, and filetype maps to a procedure generated by includegen.

    Before
    After

    These entries are matrices. Before is used to transform the complete line's distance vector, in user space, to an rmoveto locating the point where the line should begin. From the current point after the line has been placed, After transforms the line's distance vector to an rmoveto locating the reference point for the line to follow. In RagRight mode, the matrices are
    [0 0 0 0 0 0] and [-1 0 0 -1 0 -num], respectively, where num is the baseline distance given to Baselines or to Basic when the dictionary was generated.

    Examples

    Here are three examples of Basic in use; you can view them in a PostScript viewer or a text editor, depending on whether you would like to see how they look on the page or how they were written. Each one is in two versions, one with all resources included, and one that does not include them and can be viewed only if your viewer or printer already can find the MetaPre and Markup resources, as discussed in the reference section.

    The “labeler demo” was inspired by this newsgroup thread in which the original poster wanted a simple PostScript template that a PHP script could emit to make a simple label, and some of the suggested solutions had the PHP script emit LaTeX code, to be postprocessed with LaTeX and dvips. For a label!

    Bare file Resources included
    Sampler PostScript view
    text view
    PostScript view
    text view
    PDF view
    Business letter PostScript view
    text view
    PostScript view
    text view
    PDF view
    Labeler demo PostScript view
    text view
    PostScript view
    text view
    PDF view

    Valid XHTML 1.0! Valid CSS! $Id: Markup.html,v 1.14 2009/11/12 03:16:30 chap Exp $