DATE: 1998-12-01

L2/98-406

DOC TYPE:

Expert contribution

TITLE:

Proposal to encode mathematical variant tags

SOURCE:

Murray Sargent III

PROJECT:

 

STATUS:

Proposal

ACTION ID:

FYI

DUE DATE:

--

DISTRIBUTION:

Worldwide

MEDIUM:

Paper and html

NO. OF PAGES:

4


A. Administrative

1. Title

Proposal to encode mathematical variant tags

2. Requester's name

Murray Sargent III

3. Requester type

Expert request.

4. Submission date

1998-12-01

5. Requester’s reference

Scientific and Technical Information Exchange (STIX)

6a. Completion

Complete proposal

6b. More information to be provided?

If requested

 

B. Technical -- General

1a. New script? Name?

No.

1b. Addition of characters to existing block? Name?

No.

2. Number of characters

16

3. Proposed category

 

4. Proposed level of implementation and rationale

Level 3 since math variant tags qualify the base letter they follow

5a. Character names included in proposal?

10 are defined. Recommended to reserve 6 to have a group of 16

5b. Character names in accordance with guidelines?

Yes.

5c. Character shapes reviewable?

 

6a. Who will provide computerized font?

None needed

6b. Font currently available?

None needed

6c. Font format?

na

7a. Are references (to other character sets, dictionaries, descriptive texts, etc.) provided?

Yes.

7b. Are published examples (such as samples from newspapers, magazines, or other sources) of use of proposed characters attached?

Not attached, but available.

8. Does the proposal address other aspects of character data processing?

No

 

C. Technical -- Justification

1. Contact with the user community?

Yes. Patrick Ion, Barbara Beeton, Murray Sargent III

2. Information on the user community?

Professional mathematicians, physicists, astronomers, engineers, and other scientific and technical researchers.

3a. The context of use for the proposed characters?

Used in publication of research mathematics and other hard sciences.

3b. Reference

 

4a. Proposed characters in current use?

Yes.

4b. Where?

Worldwide, by scientific and technical publishers.

5a. Characters should be encoded entirely in BMP?

Yes.

5b. Rationale

Accurate publication of mathematical and scientific research on the Web is impossible without a comprehensive and accurate collection of symbols including various alphabetic variants in common use. Allocation in the BMP is in accordance with the Roadmap.

6. Should characters be kept in a continuous range?

Yes

7a. Can the characters be considered a presentation form of an existing character or character sequence?

No. The math variant tags modify the base character they follow in a way that changes that character’s semantics, i.e., it’s a different character when followed by a math variant tag than it is when it isn’t followed by such a tag.

7b. Where?

 

7c. Reference

 

8a. Can any of the characters be considered to be similar (in appearance or function) to an existing character?

No

8b. Where?

 

8c. Reference

 

9a. Combining characters or use of composite sequences included?

Yes

9b. List of composite sequences and their corresponding glyph images provided?

A list is provided below, but the corresponding glyphs are well known and are omitted.

10. Characters with any special properties such as control function, etc. included?

All the characters are modifier characters, which is a kind of control nature.

 

D. SC2/WG2 Administrative

To be completed by SC2/WG2

1. Relevant SC 2/WG 2 document numbers:

 

2. Status (list of meeting number and corresponding action or disposition)

 

3. Additional contact to user communities, liaison organizations etc.

 

4. Assigned category and assigned priority/time frame

 

Other Comments

 

 

Mathematics has need for a number of Latin and Greek alphabets that on first thought appear to be font variations of one another, e.g., normal, bold, italic and script H.  However in any given document, these characters have distinct mathematical semantics.  For example, a normal H represents a different variable from a bold H, etc.  If one drops these distinctions in plain text, one gets gibberish.  Instead of the well-known Hamiltonian formula

 

            H = ∫dτ(εΕ² + μH²),

 

you’d get the integral equation (!)

 

H = ∫dτ(εE² + μH²).

 

Accordingly, the STIX project requests adding normal, bold, italic, script, etc., Latin and Greek alphabets.  Straight encoding would amount to many characters and would lose some useful common information, such as all variants of H might not be recognizable as H’s.  But it does allow plain text to retain the proper character semantics and it allows simple (nonrich) search methods to work.

 

A more useful encoding that still allows simple search algorithms to work employs “math variant tags”, which act in some ways like nonspacing combining marks.  For example, a math script H would be encoded as H<math script>.  Encountering such a combination, a rendering engine should choose some script font to render the H.  Which script font is beyond the scope of plain text.

 

By default, math alphabetic characters would be considered to be Roman characters (serifs, not bold, not italic).  To change this status, I propose reserving a block of 16 math variant tags with the following values defined:

 

0.      math italic

1.      math bold

2.      calligraphic (script)

3.      fraktur

4.      open-face

5.      sans-serif

6.      monospace

 

Zero or more such tags can follow a base character.  So a math bold italic H would be encoded as H<math italic><math bold> or as H<math bold><math italic>. For the simplest “math-unaware” search algorithms to match a given string, it’s desirable to standardize on a given order, namely the one above.  But a slightly more sophisticated algorithm can encode the tags as bits and match random orders.

 

To allow for other cases not currently given, it’s desirable to reserve a block of 16 such math tags.