Skip to content

Commit 368abd4

Browse files
committed
Document MultiAddr codecs implemented by py-multiaddr
1 parent e088ae7 commit 368abd4

File tree

1 file changed

+160
-1
lines changed

1 file changed

+160
-1
lines changed

README.md

Lines changed: 160 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -104,6 +104,165 @@ TODO: specify the encoding (byte-array to string) procedure
104104

105105
TODO: specify the decoding (string to byte-array) procedure
106106

107+
### Codecs
108+
109+
Depending on the protocol of a Multiaddr component, different algorithms are used to
110+
convert their values from/to binary representation. The name of the codec to use
111+
for each protocol is noted in [protocols.csv](protocols.csv).
112+
113+
In general empty values in the string representation are always disallowed unless
114+
explicitely noted otherwise. In case of conversion errors implementation must
115+
refuse to process the given string/binary value and report the error to the caller
116+
instead.
117+
118+
Depending on the codec type codecs may either be encoded using the standard variable
119+
length encoding style, or into a specific static-length binary value without the
120+
extra length information if this is noted in the respective codec's description.
121+
122+
All code examples are written in Python-based pseudo code and are optimized for
123+
legibility rather than speed. In general you should always use existing libraries
124+
and functions for performing the below conversions rather than rolling your own.
125+
126+
#### `fspath`
127+
128+
Encodes the given Unicode string using the system's local file system encoding.
129+
On Windows this encoding likely being UTF-16, while being UTF-8 on most other
130+
systems. It is up to the library to figure out the best encoding value for
131+
these kinds of strings.
132+
133+
* String → Binary: `str.encode(SYSTEM_FILESYSTEM_ENCODING)`
134+
* Binary → String: `bytes.decode(SYSTEM_FILESYSTEM_ENCODING)`
135+
136+
Protocols using the `fspath` encoding must not be shared between different hosts.
137+
138+
#### `idna`
139+
140+
Encodes the given Unicode representation according to IDNA-2008 ([RFC 5890](https://tools.ietf.org/html/rfc5890)) conventions using the [UTS-46](https://tools.ietf.org/html/rfc5890) input normalization and processing rules.
141+
142+
* String → Binary:
143+
1. Normalize and validate the given input string according to [UTS-46 Section 4 (Processing)](https://www.unicode.org/reports/tr46/#Processing) and [UTS-46 Section 4.1 (Validity Criteria)](https://www.unicode.org/reports/tr46/#Validity_Criteria) with the following parameters:
144+
* UseSTD3ASCIIRules = true
145+
* CheckHyphens = true
146+
* CheckBidi = true
147+
* CheckJoiners = true
148+
* Transitional_Processing = false
149+
2. Convert the Unicode string to ASCII using the [UTS-46 Section 4.2 (ToASCII)](https://www.unicode.org/reports/tr46/#ToASCII) algorithm steps 2–6 with parameter *VerifyDnsLength* set to *true* and return the result.
150+
* Binary → String:
151+
Convert the ASCII text string to Unicode according to the rules of [UTS-46 Section 4.3 (ToUnicode)](https://www.unicode.org/reports/tr46/#ToUnicode) using the same parameters as in step 1 of the *String → Binary* algorithm.
152+
153+
Examples of libraries for performing the above steps include the [Python idna](https://pypi.org/project/idna/) library.
154+
155+
#### `ip4`
156+
157+
Encodes an IPv4 address according to the conventional [dot-decimal notation](https://en.wikipedia.org/wiki/Dot-decimal_notation) first specificed in [RFC 3986 section 3.2.2 page 20 § 2](https://tools.ietf.org/html/rfc3986#page-20).
158+
159+
Protocols using this codec must encode it as binary value of exactly 4 bytes without
160+
an extra length value.
161+
162+
* String → Binary:
163+
1. Split the input string into parts at each dot (U+002E FULL STOP):
164+
`sparts = str.split(".")`
165+
2. Assert that exactly 4 string parts were created by the split operation:
166+
`assert len(parts) == 4`
167+
3. Convert each part from its ASCII base-10 number representation to an integer type, aborting if the conversion fails for any of the decimal string parts:
168+
`octets = [int(p) for p in parts]`
169+
4. Validate that each part of the resulting integer list is in rage 0 – 255:
170+
`assert all(i in range(0, 256) for i in octets)`
171+
4. Copy each of the resulting integers into a binary string of length 4 in network byte-order:
172+
`return b"%c%c%c%c" % (octets[0], octets[1], octets[2], octets[3])`
173+
* Binary → String:
174+
1. Take the four bytes of the binary input and convert each to its equivalent base-10 ASCII representation without any leading zeros:
175+
`octets = [str(binary[idx]) for idx in range(4)]`
176+
2. Concatinate resulting list of stringified octets using dots (U+002E FULL STOP):
177+
`return ".".join(octets)`
178+
179+
Converting from string to binary addresses may be done using the POSIX
180+
[`inet_addr`](https://pubs.opengroup.org/onlinepubs/9699919799/functions/inet_addr.html)
181+
function or the similar common Unix [`inet_aton`](https://man.cx/inet_aton(3))
182+
function and its equivalent bindings in many other languages. Similarily the POSIX
183+
[`inet_ntoa`](https://pubs.opengroup.org/onlinepubs/9699919799/functions/inet_ntoa.html)
184+
function available in many languages implements the previously mentioned binary
185+
to string address transformation.
186+
187+
#### `ip6`
188+
189+
Encodes an IPv6 address according to the rules of [RFC 4291 section 2.2](https://tools.ietf.org/html/rfc4291#section-2.2) and [RFC 5962 section 4](https://tools.ietf.org/html/rfc5952#section-4).
190+
191+
Protocols using this codec must encode it as binary value of exactly 16 bytes without
192+
an extra length value.
193+
194+
* String → Binary:
195+
Parse the given input address string according to the rules of [RFC 4291 section 2.2](https://tools.ietf.org/html/rfc4291#section-2.2) creating a 16-byte binary string. All textual variations (upper-/lower-casing, IPv4-mapped addresses, zero-compression, stripping of leading zeros) must be supported by the parser. Note that [scoped IPv6 addressed containing a zone identifier](https://tools.ietf.org/html/draft-ietf-ipngwg-scopedaddr-format-02) may not appear in the input string; external mechanisms may be used to encode the zone identifier separately through.
196+
* Binary → String:
197+
Generate a canonical textual representation of the given binary input address according to rules of [RFC 5962 section 4](https://tools.ietf.org/html/rfc5952#section-4). Implementations must not produce any of the variations allowed by RFC 4291 mentioned above to ensure that all implementation produce a character by character identical string representation.
198+
199+
Converting between string to binary addresses should be done using the equivalent
200+
of the POSIX [`inet_pton`](https://pubs.opengroup.org/onlinepubs/9699919799/functions/inet_pton.html)
201+
and [`inet_ntop`](https://pubs.opengroup.org/onlinepubs/9699919799/functions/inet_ntop.html)
202+
functions. Alternatively, using the BSD
203+
[`getaddrinfo`/`freeaddrinfo`](https://pubs.opengroup.org/onlinepubs/9699919799/functions/getaddrinfo.html)
204+
and [`getnameinfo` with `NI_NUMERICHOST`](https://pubs.opengroup.org/onlinepubs/9699919799/functions/getnameinfo.html)
205+
may be a viable alternative for some environments.
206+
207+
### `onion`
208+
209+
Encodes a [TOR rendezvous version 2 service pointer](https://gitweb.torproject.org/torspec.git/tree/rend-spec-v2.txt?id=471af27b55ff3894551109b45848f2ce1002441b#n525) (aka .onion-address) and exposed service port on that system.
210+
211+
Protocols using this codec must encode it as binary value of exactly 12 bytes without
212+
an extra length value.
213+
214+
* String → Binary:
215+
1. Split the input string into 2 parts at the colon character (U+003A COLON):
216+
`(service_str, port_str) = str.split(":")`
217+
2. Decode the *service* part before the colon using base32 into binary:
218+
`service_bin = b32decode(service_str)`
219+
3. Convert the *port* part to a binary string as specified by the [`uint16be`](#uint16be) codec.
220+
4. Concatenate the service and port parts to obtain the final binary encoding:
221+
`return service_bin + port_bin`
222+
* Binary → String:
223+
1. Split the binary value at the last two bytes into an service name and a port
224+
number:
225+
`(service_bin, port_bin) = binary.split_at(-2)`
226+
2. Convert the service part into a base32 string:
227+
`service_str = b32encode(service_bin)`
228+
3. Convert the *port* part to text as specified by the [`uint16be`](#uint16be) codec.
229+
4. Concatenate the result strings using a colon:
230+
`return service_str + ":" + port_str`
231+
232+
### `p2p`
233+
234+
Encodes a libp2p node address.
235+
236+
TBD: Is this really always a base58btc encoded string of at least 5 characters in length!?
237+
238+
239+
### `uint16be`
240+
241+
Encodes an unsigned 16-bit integer value (such as a port number) in network byte
242+
order (big endian).
243+
244+
Protocols using this codec must encode it as binary value of exactly 2 bytes without
245+
an extra length value.
246+
247+
* String → Binary:
248+
1. Parse the input string as base-10 integer:
249+
`integer = int(str, 10)`
250+
2. Verify that the integer is in a valid range for a positive 16-bit integer:
251+
`assert integer in range(65536)`
252+
3. Convert the integer to a 2-byte long big endian binary string:
253+
`return b"%c%c" % ((integer >> 8) & 0xFF, integer & 0xFF)`
254+
* Binary → String:
255+
1. Convert the two input bytes to a native integer:
256+
`integer = port_bin[0] << 8 | port_bin[1]`
257+
2. Generate a base-10 string representation from this integer:
258+
`return str(integer, 10)`
259+
260+
POSIX/BSD provides [`strtoul`](https://en.cppreference.com/w/c/string/byte/strtoul)
261+
and [`htons`](https://pubs.opengroup.org/onlinepubs/9699919799/functions/htons.html)
262+
for the string to binary conversion and
263+
[`ntohs`](https://pubs.opengroup.org/onlinepubs/9699919799/functions/ntohs.html)
264+
and [`snprintf`](https://en.cppreference.com/w/c/io/snprintf) for the performing
265+
the inverse operation.
107266

108267
## Protocols
109268

@@ -156,4 +315,4 @@ Small note: If editing the README, please conform to the [standard-readme](https
156315

157316
## License
158317

159-
This repository is only for documents. All of these are licensed under the [CC-BY-SA 3.0](https://ipfs.io/ipfs/QmVreNvKsQmQZ83T86cWSjPu2vR3yZHGPm5jnxFuunEB9u) license, © 2016 Protocol Labs Inc. Any code is under a [MIT](LICENSE) © 2016 Protocol Labs Inc.
318+
This repository is only for documents. All of these are licensed under the [CC-BY-SA 4.0](https://ipfs.io/ipfs/QmVreNvKsQmQZ83T86cWSjPu2vR3yZHGPm5jnxFuunEB9u) license, © 2016 Protocol Labs Inc, © 2019 Alexander Schlarb. Any code is under a [MIT](LICENSE) © 2016 Protocol Labs Inc, © 2019 Alexander Schlarb.

0 commit comments

Comments
 (0)