|
The identifier string is cleaned of characters that are expected to
occur rarely in object identifiers but that would cause certain known
problems for file systems. In this step, every UTF-8 octet outside the
range of visible ASCII (94 characters with hexadecimal codes 21-7e)
[ASCII] (Cerf, “ASCII format for network interchange,” October 1969.), as
well as the following visible ASCII characters:
" hex 22 < hex 3c ? hex 3f
* hex 2a = hex 3d ^ hex 5e
+ hex 2b > hex 3e | hex 7c
, hex 2c
must be converted to their corresponding 3-character hexadecimal
encoding, ^hh, where ^ is a circumflex and hh is two hex digits. For
example, ' ' (space) is converted to ^20 and '*' to ^2a.
In the second step, the following single-character to single-character
conversions must be done:
/ -> =
: -> +
. -> ,
These are characters that occur quite commonly in opaque identifiers
but present special problems for filesystems. This step avoids requiring
them to be hex encoded (hence expanded to three characters), which keeps
the typical ppath reasonably short. Here are examples of identifier
strings after cleaning and after ppath mapping:
id: ark:/13030/xt12t3
-> ark+=13030=xt12t3
-> ar/k+/=1/30/30/=x/t1/2t/3/
id: http://n2t.info/urn:nbn:se:kb:repos-1
-> http+==n2t,info=urn+nbn+se+kb+repos-1
-> ht/tp/+=/=n/2t/,i/nf/o=/ur/n+/n/bn/+s/e+/kb/+/re/p/os/-1/
id: what-the-*@?#!^!?
-> what-the-^2a@^3f#!^5e!^3f
-> wh/at/-t/he/-^/2a/@^/3f/#!/^5/e!/^3/f/
(From section 3 of the Pairtree specification)
- Parameters:
id (identifier) - Encode the given identifier according to the pairtree 0.1
specification
- Returns:
- A string of the encoded identifier
|