This document has a standard, validated CSS2 stylesheet, which your browser does not seem to display properly. In a browser supporting web standards, this table of contents would be fixed at the side of the page for easy reference.
anastigmatix home
Markup
is a procedure-set resource for the PostScript
language that changes the form of input to free lines of text that
can be interrupted by fragments of PostScript programming. The obvious
application is text formatting, but Markup
can be adapted
to many jobs that involve reading material line-by-line.
Quite sophisticated resources for in-PostScript text formatting exist, as
can be seen in my direct PostScript resources survey. They offer
full justification, tables, columns, and many other capabilities associated
with complete text typesetting systems. Those systems run 40 to 200 kilobytes
and more of interpreter memory, and Markup
, at about eight, is
not intended to replace but to complement them. It has no preordained idea
what to do with the lines it reads, but can be linked to the procedures of
a sophisticated typesetting resource to drive it with a
convenient form of input. It works well in this capacity with
the TinyDict,
which does not have free-text input provisions of its own.
Markup
does include simple provisions, usable without any
larger typesetting library, for simple line-for-line
setting of text—that is, producing one set line from each input line,
without filling words to fit—ragged right, ragged left, or centered,
and Markup
's especially simple relationship to the underlying
PostScript language means those standalone facilities are versatile enough
for everyday business correspondence, promotional flyers, labeling of figures,
and other jobs that do not demand the greater automation the elaborate
systems provide. There is no plan to add significantly to these built-in
capabilities, as Markup
is meant always to be
lightweight enough to be an attractive front end for other libraries and,
by not being tied to any one in particular, to stimulate use and development
of new and existing libraries built on it or usable with it.
Markup
's standalone capabilities are meant to be adequate for a
range of simple tasks but never to become the main point.
Markup
is intended to stake out a distinctive
position in the relationship of the markup language to the underlying
PostScript. Some of the typesetting libraries I have surveyed introduce markup
codes with an all new look and new rules for scanning and syntax, new
mechanisms for defining commands or selecting fonts, and so on.
Markup
strives to avoid reinventing anything that PostScript
already does easily and well, and to behave as nearly as possible as a
natural outgrowth of PostScript. This example is written for
Markup
with its own built-in Basic
formatter:
Markup may offer no dedicated new codes for, for example, font switching but will certainly accept \{/Times-Italic 12 selectfont}the appropriate ordinary PostScript\{/Times 12 selectfont} anywhere in the input. The effect depends on the back-end typesetting library in use, but if it follows the same philosophy of transparency, as Markup's Basic certainly does, this will do just what you expect.
If there are only a few font changes in a one-off document, it would be hard to beat that form for clarity: there is no burden of learning or remembering new commands for font selection, or which fonts have been assigned to them. In a longer document, or where a consistent style is important, it will make sense to define some compact abbreviations. Now style changes can be made in one place. But that doesn't get any easier than PostScript already makes it:
\{ /ro {currentfont /Times-Roman 12 selectfont} bind def /it {currentfont /Times-Italic 12 selectfont} bind def /last {setfont} bind def }\ro This text is in Roman; \it this is emphasized, but to \ro really \last emphasize something that's already in italics, one sometimes goes back to Roman. \last It is easy to set up a \/quoteleft last \/quoteright command when the stack behaves as expected. I like using the standard PostScript glyph names for the quotes rather than remembering to write \<60>last\<27> in StandardEncoding, or \<91>last\<92> in CE encoding\/mdash but that's a matter of preference.
Markup
's clear family resemblance to PostScript is a result
of its PostScript-like scanning rules, which result from its use of
PostScript's own scanning operators, with rules as simple as can be:
Markup
reads lines. That is, it uses the
readline operator, which imposes the usual PostScript rule making
the line endings of various operating systems
equivalent. The line-ending characters are not included in the returned
string. That solves a common annoyance in getting text handling right
across platforms, and does so with PostScript's own mechanism.PartLine
.
Otherwise, the line is a FullLine
if there is anything on it, or
a BareLine
if there isn't. Markup
is configured by a
dictionary for what to do in each case; that is how it is set up to drive any
given typesetting library. Typically BareLine
is hooked to
whatever makes the library start a new paragraph, so running paragraphs of
text can be separated simply by bare lines.Markup
reads a single token, using
PostScript's token operator. Therefore the interruption can be any
single PostScript token, for example an integer or a name, a string, a hex
string, or an entire bracketed procedure of any length; PostScript's rules for
white space and comments apply. What to do with the token, and what to do with
any result of doing that, are configured in the dictionary; reasonable
conventions are to execute procedures, add strings to the current line or
paragraph, pass literal names to glyphshow, and leave other things on
the stack alone (to allow uses like the currentfont stacking example).
Then Markup
resumes reading lines.To express those rules in PostScript took even fewer words than to describe them here. A few natural consequences are worth mentioning:
PartLine
.BareLine
: the scanner, resuming after the
interruption, finds zero characters and then the line end. To avoid an
unwanted paragraph break, move the ending delimiter to the beginning of the
next line:
The token at the end of this line\{0.5 setgray} causes an unwanted paragraph break, but this one\{0.5 setgray } does not. There is no unwanted break with this glyph name\/mdash because the newline was consumed in terminating it.
If that is too uncomfortable, the name \ could be defined (just once) as the string (\\):This line contains \(\\) a backslash.
If there are lots of backslashes to be included, perhaps it is worth changing\{ /\ (\\) def }This line contains \\ one too. Don't forget the extra space.
EODString
in Markup
's configuration
dictionary to some other value. A large block of verbatim text can be
included by temporarily setting EODString
to some unlikely
value like ThisIsTheEndOfAllThatVerbatimText.
Markup
is a ProcSet
resource.
To make it available to your own code, include in the setup section of
your file:
/net.anastigmatix.Markup /ProcSet findresource begin
The findresource
will succeed if you have made
the
Markup
resource file [download]
available in any of these ways:
findresource
(which belongs in the
setup section)%%DocumentNeededResources
and
%%IncludeResource
DSC comments, you include these comments at
the right position in your file to specify that it needs
net.anastigmatix.Markup, your document manager software is configured to
automatically insert needed resources in files being printed, and you have
put the Markup resource file where your document manager can find
it.
Markup
relies on another resource,
MetaPre
(the eight kilobyte memory figure given earlier is the total for both).
You will need that file also. If you use the first method, you should
include both files in the prolog of your document, MetaPre
first. The other methods should Just Work as long as both files are where
they need to be. In any case, your document only needs the single
findresource
line shown above. A
findresource
for MetaPre
is not needed unless you
also use MetaPre
features in your own document. You pay no
penalty to do so, as the resource must be there anyway.
The resource files are in a compact form. That is for efficiency, not to keep you from viewing them; there is a script for that on the resource packaging page.
The Markup
dictionary is read-only. Before creating any
definitions, you will want eitheruserdict begin
or your
own dict begin
so that you have a writable dictionary on top
of the dictionary stack.
This section describes the contents of the read-only dictionary that is
returned by /net.anastigmatix.Markup /ProcSet findresource
.
The dictionary contains one definition that implements Markup
itself, reading and processing free text input as described in the
introduction.
\markup configures itself according to the supplied parameter dictionary dict and begins reading and processing input until it reaches end of input, or the PostScript operator stop is executed. It reads from the file given as SourceFile in the parameter dictionary, supplying a definition based on currentfile if the entry is not present. If currentfile is the source, ordinary PostScript interpretation of the file resumes after a stop is executed.
Lines are read by readline. If readline reads a complete line (terminated by newline), the line is a FullLine if it has nonzero length, a BareLine otherwise, and the corresponding procedure is executed with the line on the stack.
If readline does not read a complete line, either the end of input has been reached, or an interruption has been reached, defined by the values EODCount and EODString. The PartLine procedure is executed with the partial line read on the stack, and then the token operator is used to read a single token from SourceFile.
If token succeeds, the procedure HandleToken is executed with the token on the stack. If HandleToken returns (without executing stop), the procedure HandleResult is executed. If that also returns without executing stop, \markup freshly fetches SourceFile, EODCount, and EODString from the parameter dictionary in case their values were changed in handling the token, and resumes reading lines from the (possibly changed) SourceFile.
If token returns false, the end of SourceFile has been reached. If no EOFToken entry is present in the dictionary, \markup completes. If an EOFToken is present, \markup executes HandleToken and HandleResult just as if that token had been read and, if stop has not been executed, freshly fetches SourceFile, EODCount, and EODString, and resumes reading from the (presumably changed) SourceFile.
The parameter dictionary passed to \markup may be crafted for some underlying library for typesetting (or even some other purpose), to cause \markup to execute the appropriate procedures of that library. The dictionary may contain additional entries controlling the interface to that library and of no interest to \markup. The entries that are meaningful to \markup are described here:
Key | Purpose |
---|---|
BareLine | Procedure to be executed when readline has returned true, meaning a newline was encountered, and the line read was of zero length. This can happen for an actual bare line in the input, or for the final readline of a line that ends with a self-delimiting PostScript token. For consistency with FullLine and PartLine, the stack contains (a) whatever was on it when \markup was invoked (except for the parameter dictionary), (b) as modified by any embedded PostScript tokens since, and not consumed by HandleToken or HandleResult, and (c) on top, the string just read, though in this case it is known to be empty; in some uses (simple line-by-line setting), it can make sense for BareLine and FullLine to be the same procedure. When used with paragraph-at-a-time systems, BareLine will usually be defined to invoke the library procedure for filling and setting the completed paragraph. \markup fetches this value only once, so changing it from embedded PostScript code has no effect, but nothing stops the supplied procedure looking at other keys and having changeable behavior. |
EODCount | An integer that, in combination with EODString, determines how embedded-PostScript “interruptions” are recognized. Determines how many instances of EODString can be encountered in the input before reading is interrupted; overlapping instances are not multiply counted. If the value is zero, the first occurrence of EODString interrupts reading and is not read as part of the text. If this entry is not present in the dictionary, \markup adds a definition of zero. If EODString is empty, reading will be interrupted when exactly EODCount bytes have been read; this combination is not likely to have practical use in \markup. This value is freshly fetched from the dictionary after any embedded token has been handled, so embedded PostScript tokens may change it. |
EODString | A string that, in combination with EODCount, determines how embedded-PostScript “interruptions” are recognized. See EODCount. If this entry is not present in the dictionary, \markup adds a definition of (\\). This value is freshly fetched from the dictionary after any embedded token has been handled, so embedded PostScript tokens may change it. |
EOFToken (optional) |
When the end of SourceFile has been reached, ordinarily \markup completes. If this entry is present, its value is treated just as if it had been returned by token at the end of the file; HandleToken and HandleResult are executed, and if stop was not executed, reading resumes from the (presumably changed) SourceFile. This is the mechanism by which a file-inclusion operation can be supplied. The idea is for the file-include procedure to store the new file in SourceFile and supply an EOFToken that restores the old one (and the old EOFToken). Such a procedure can be generated by includegen, which is used to provide the file inclusion feature in Basic. If there is an EOFToken and it does not either replace SourceFile, replace EOFToken, or execute stop, \markup will reach end of file and spin. |
FullLine | Procedure to be executed when readline has returned true, meaning a newline was encountered, and the line read was of nonzero length. This can represent an entire full line in the input, or the final segment of a line after an embedded PostScript token. The stack contains (a) whatever was on it when \markup was invoked (except for the parameter dictionary), (b) as modified by any embedded PostScript tokens since, and not consumed by HandleToken or HandleResult, and (c) on top, the string just read. In driving a line-at-a-time text setting system, this procedure may invoke PartLine and then a library procedure to set the complete accumulated line; for a paragraph-at-a-time system, this procedure and PartLine will probably be the same. \markup fetches this value only once, so changing it from embedded PostScript code has no effect, but nothing stops the supplied procedure looking at other keys and having changeable behavior. |
HandleResult | Procedure to be executed when an embedded token has been read, after HandleToken has been executed, and if HandleToken did not incur a stop. Anticipated uses may check the number and type of objects on the stack, and perhaps add a string on top of stack to the accumulating line, interpret a literal name as a glyph name to be shown, or other conventions appropriate to the application. |
HandleToken | Procedure to be executed when an embedded token has been read; any results on the stack may then be processed by HandleResult. One reasonable implementation of HandleToken is a simple unconditional exec. At the time this procedure is executed, SourceFile contains the file object from which the token was read. |
LineBuffer | A string that will be used as the buffer for readline. The size of this string determines the maximum line length that can be found in the input without incurring a rangecheck. |
PartLine | Procedure to be executed when readline has returned false, meaning it encountered the end of input (or an interruption for embedded PostScript) before a newline. The line may be of zero or nonzero length. The stack contains (a) whatever was on it when \markup was invoked (except for the parameter dictionary), (b) as modified by any embedded PostScript tokens since, and not consumed by HandleToken or HandleResult, and (c) on top, the string just read. \markup fetches this value only once, so changing it from embedded PostScript code has no effect, but nothing stops the supplied procedure looking at other keys and having changeable behavior. |
SourceFile | File object from which \markup reads input. If this entry is not initially present, \markup defines it with the file object obtained from currentfile. This value is freshly fetched from the dictionary after any embedded token has been handled, so embedded PostScript tokens may change it. |
The resource dictionary includes four utility procedures likely to be useful to other code.
Returns a procedure proc that will implement an “include” facility by manipulating the SourceFile and EOFToken entries in a markup dictionary dict. The returned proc has stack effect (file proc –). It will install file as SourceFile, and install an EOFToken that will restore the previous SourceFile and EOFToken, if any, when it is invoked by \markup at the end of the included file.
Returns the change in current point that would result from executing name glyphshow. I used to wonder why PostScript had both show and glyphshow, but only stringwidth and no glyphwidth. I got tired of wondering.
Read “grestore minus path.” Replaces the current graphics state using all values saved by the matching gsave except the current path, which is preserved. Any current path that was saved by gsave is lost.
Read “setgstate minus path.” Replaces the current graphics state using all values saved in gstate except the current path, which is preserved. Any current path that was saved in gstate is not lost—it is still in gstate—but does not become the current path.
The resource dictionary includes prototypes for parameter dictionaries giving two usable configurations of \markup.
This is a parameter dictionary that simply supplies values for PartLine, FullLine, BareLine, HandleToken and HandleResult that will cause \markup to dump to standard output the same input that it reads (PostScript tokens will be as formatted with ==). A 192-byte default LineBuffer is provided. The dictionary is read-only; you should allocate a new dictionary and copy the contents so automatically-added contents like SourceFile, EODCount, and EODString can be added.
This is a procedure to produce a parameter dictionary configuring
\markup as the simple, line-for-line Basic
text
formatter described in the introduction. The dictionary is initially
configured to set ragged-right with an advance of num units
downward between consecutive baselines, though both can be changed in the
resulting dictionary before or during use. “Right” and
“down” are taken, as in PostScript default coordinates, to be
increasing x and decreasing y, respectively. A 192-byte
default LineBuffer is provided.The details of the
Basic
formatter are described in the next section.
Basic
is a minimal, line-for-line text formatter that can be
driven by Markup
. Its versatility stems from its very close
relationship to the PostScript language beneath; what happens when arbitrary
PostScript sequences are embedded in text being set should (and often does)
match what you might expect without thinking too hard.
Basic
requires an initial current point to be set, as with
moveto. Each line of text is placed with its reference point at the
current point: in RagRight mode, the left end of the line is placed
there, in RagLeft mode, the right end, and in Center mode,
the middle, and the original current point is moved by the Baselines
advance. These are special cases for the values of two matrices that
completely control
line placement. And with that, the operation of Basic
is almost
completely described—nothing remains but details.
Basic
is implemented as a parameter dictionary
for \markup. An instance of the dictionary is generated by the
procedure Basic as described in the previous section. The dictionary
has several entries in addition to those required by \markup itself.
Some of these entries are procedures that can be used to change settings
from embedded PostScript code. To use them that way, it is convenient to
put the parameter dictionary on the lookup stack, something \markup
does not automatically do:
14 Basic dup begin userdict begin \markup
The userdict begin
merely arranges that any definitions created
in your embedded PostScript do not wind up in the parameter dictionary itself.
Be sure to use store and not def when it is your
intent to change values in the parameter dictionary: Basic
looks
only there for its own state.
The Basic
parameter dictionary defines \markup's
required keys BareLine, PartLine, FullLine, and
HandleResult all in terms of each other and Basic
's
two fundamental operations, Track and Place. Track
and Place can be used directly from embedded PostScript for particular
effects. Consider two similar lines of input:
\{(M)stringwidth rmoveto}This is a line of text. \{(M)stringwidth{rmoveto}Track}This is another line of text.
The first example simply moves the current point one em to the right,
essentially behind Basic
's back. Because Basic
simply places all text according to the current point and advances it with
relative moves, the effect is a one-em indent for the current and all
following lines, until changed by another rmoveto. You would do this
to indent a block of text, or to pick up and move to an entirely new area
of the page.
The second example makes the rmoveto into a tracked element of the current line, which will be compensated when the line is placed and the current point is advanced. The one-em indent will last only for one line. You would use this for a paragraph indent (probably by supplying a definition of BareLine that used it, so the indent would be applied for each bare line in the input).
An untracked move will have the same effect no matter where in a line
it appears—it is simply a change to the current point made behind
Basic
's back while the line is being assembled and before it
is placed. Tracked moves
can appear anywhere within lines, and do just what you would think.
The entries specific to Basic
in the parameter dictionary are
these:
Set the baseline advance to num (the initial value was given by the num argument to Basic when the dictionary was generated). A positive value advances downward (decreasing y). Modifies the After matrix.
Change the line placement mode (RagRight is the default when the dictionary is generated) by modifying the Before and After matrices.
The first part of Basic
's internal mechanism for text
placement. Track is used to build a queue of procs
representing segments of a line. It takes a snapshot of the current
graphics state and operand stack, executes proc without marking the
page, finds a distance vector from the current point before executing
proc to the current point after, restores the graphics state from
the snapshot, updates a cumulative distance vector for the line, and adds
proc and the snapshot to the queue. The cumulative distance vector
is maintained in device space to be independent of changes to user space.
If proc executes stop, Track does not update the
distance vector or add to the queue. This is one way a line-filling
formatter could be built on top of Basic
: Track
each candidate element wrapped in a suitable procedure that will
stop if it cannot fit on the current line. The origin
is the initial current point when Track executes proc,
so following the graphics operations proc can obtain its own
distance vector (in user units) with a simple currentpoint.
This is one application that would justify designing proc
to behave differently when executed by Track and when later
executed by Place—a bad idea, in more ordinary circumstances.
The second part of the text placement process. Before executing the queued procs that make up the line, Place transforms the line's cumulative distance vector to user space and then through the matrix before to obtain arguments for an rmoveto. For each proc on the queue, Place then imposes its saved operand stack and graphics state, using sgs-path so the current point is not altered, executes proc, this time marking the page, and cleans the operand stack. After executing the last proc, Place restores the original operand stack and again transforms the line's distance vector, this time using the after matrix to find an rmoveto that reaches the next line's reference point, and resets the distance vector.
From the description of Track and Place it should be clear that every proc representing a segment of the line is executed twice, first by Track and later, with the same operand stack, by Place. The necessary assumption is that the procs do not depend on side effects or other state than their operands that could change their behavior from one execution to the next. If a proc violates that assumption, surprises may result. Absolute moves are not recommended because the current point is unlikely to be the same both times proc is executed.
Returns the current accumulated width vector of all that has been Tracked and not yet Placed, that is, the vector from the start point of the first item Tracked since the last Place to the end point of the most recent, in units of the user space in effect when Width is executed. This could be used in conjunction with conditional Tracking to implement a line-filling formatter as suggested in the description of Track.
This entry is a dictionary mapping object type names to procedures,
used to customize the behavior of HandleResult after an embedded
PostScript token has been executed. If there is at least one item on
the stack, and the type of the topmost item is a key in
ResultHandlers, then the associated procedure is executed.
Three types are initially present: stringtype maps to the
procedure PartLine, nametype to the procedure
{{glyphshow}Track}
, and filetype
maps to a procedure generated by
includegen.
These entries are matrices. Before is used to transform the
complete line's distance vector, in user space, to an rmoveto
locating the point where the line should begin. From the current point
after the line has been placed, After transforms the line's
distance vector to an rmoveto locating the reference point for
the line to follow. In RagRight mode, the matrices are
[0 0 0 0 0 0]
and [-1 0 0 -1 0 -num]
,
respectively, where num is the baseline distance given to
Baselines or to Basic when the dictionary was generated.
Here are three examples of Basic
in use; you can view them in
a PostScript viewer or a text editor, depending on whether you would like
to see how they look on the page or how they were written. Each one is in
two versions, one with all resources included, and one that does not
include them and can be viewed only if your viewer or printer already can
find the MetaPre
and Markup
resources, as discussed
in the reference
section.
The “labeler demo” was inspired by this newsgroup thread in which the original poster wanted a simple PostScript template that a PHP script could emit to make a simple label, and some of the suggested solutions had the PHP script emit LaTeX code, to be postprocessed with LaTeX and dvips. For a label!
Bare file | Resources included | ||
---|---|---|---|
Sampler |
PostScript view text view |
PostScript view text view |
PDF view |
Business letter |
PostScript view text view |
PostScript view text view |
PDF view |
Labeler demo |
PostScript view text view |
PostScript view text view |
PDF view |