1 files changed, 136 insertions, 0 deletions
diff --git a/doc/technical/hostmask.txt b/doc/technical/hostmask.txt
new file mode 100644
index 0000000..892bc93
--- /dev/null
+++ b/doc/technical/hostmask.txt
@@ -0,0 +1,136 @@
+The Hostmask and Netmask System
+Copyright(c) 2001 by Andrew Miller(A1kmm)<a1kmm@mware.virtualave.net>
+
+$Id$
+------------------------------------------------------------------------
+
+Contents ::
+============
+* Section 1: Motivation
+* Section 2: Underlying Mechanism
+  - 2.1: General Overview
+  - 2.2: IPv4 Netmasks
+  - 2.3: IPv6 Netmasks
+  - 2.4: Hostmasks
+* Section 3: Exposed Abstraction Layer
+  - 3.1: Parsing Masks
+  - 3.2: Adding Configuration Items
+  - 3.3: Initialising or Rehashing
+  - 3.4: Finding IP/Hostname Confs
+  - 3.5: Deleting Entries
+  - 3.6: Reporting Entries
+
+Section 1: Motivation
+=====================
+
+Looking up configuration hostnames and IP addresses (such as for I-Lines
+and K-Lines) needs to be implemented efficiently. It turns out a hash
+based algorithm like that employed here performs very will on the average
+case, which is what we should be the most concerned about. A profiling
+comparison with the mtre code using data from a real network confirmed
+that this algorithm performs much better.
+
+
+Section 2: Underlying Mechanism
+===============================
+
+2.1: General Overview
+---------------------
+
+In short, a hash-table with linked lists for buckets is used to locate
+the correct hostname/netmask entries. In order to support CIDR IPs and
+wildcard masks, the entire key cannot be hashed, and there is a need to
+rehash. The means for deciding how much to hash differs between the
+hostmasks and IPv4/6 netmasks.
+
+2.2: IPv4 Netmasks
+------------------
+
+In order to hash IPv4 netmasks for addition to the hash, the mask is first
+processed into a 32-bit address and a number of bits is used. All unused
+bits are set to 0. The mask could be in these forms:
+
+1.2.3.4     => 1.2.3.4  : 32
+1.2.3.*     => 1.2.3.0  : 24
+1.2.*.*     => 1.2.0.0  : 16
+1.2.3.64/26 => 1.2.3.64 : 26
+
+The number of whole bytes is then calculated, and only those bytes are
+hashed (e.g. 1.2.3.64/26 and 1.2.3.0/24 hash the same). When a complete
+IPv4 address is given so that an IPv4 match can be found the entire IP
+address is first hashed, and then looked up in the table. Then the most
+significant three bytes are hashed, followed by the most significant two,
+the most significant one, and finally the "identity hash" bucket is
+searched (to match masks like 192/7).
+
+2.3: IPv6 Netmasks
+------------------
+
+As per the IPv4 netmasks, except that instead of rehashing with one byte
+granularity, a 16-bit (two byte) granularity is used, as 16 rehashes is
+considered too great a fixed offset to be justified for a (possible)
+slight reduction in hash collisions.
+
+2.4: Hostmasks
+--------------
+
+On adding a hostmask to the hash, all of the hostmask right of the next
+dot after the last wildcard character in the string is hashed, or in the
+case that there are no wildcards in the hostmask, the entire string is
+hashed.
+
+On searching for a hostmask match, the entire hostname is hashed, followed
+by the entire hostmask after the first dot, followed by the entire hostmask
+after the second dot, and so on. Finally the "identity hash" bucket is checked
+to catch hostnames like *test*.
+
+Section 3: Exposed Abstraction Layer
+====================================
+
+Section 3.1: Parsing Masks
+--------------------------
+
+Call "parse_netmask()" with the netmask and a pointer to an irc_inaddr
+structure to be filled in, as well as a pointer to an integer where the
+number of bits will be placed.
+
+Always check the return value, if it returns HM_MOST, it means that the
+mask is probably a hostmask. If it returns HM_IPV4, it means it was an
+IPv4 address. If it returns HM_IPV6, it means it was an IPv6 address.
+If parse_netmask() returns HM_MOST however, no change is made to the
+irc_inaddr structure or the number of bits.
+
+Section 3.2: Adding Configuration Items
+---------------------------------------
+
+Call "add_conf_by_address()" with the hostname or IP mask, the username,
+and the ConfItem* to associate with this mask.
+
+Section 3.3: Initialising and Rehashing
+---------------------------------------
+
+To initialise, call "init_host_hash()". This only needs to be done once
+on start-up. On rehash, to wipe out the old unwanted configuration, and
+free them if there are no references to them, call
+"clear_out_address_conf()".
+
+Section 3.4: Finding IP/Hostname Confs
+---------------------------------------
+
+Call "find_address_conf()" with the hostname, the username, the address,
+the address family and the client-supplied password. To find a D-Line,
+call "find_dline()" with the address and address family.
+
+Section 3.5: Deleted Entries
+----------------------------
+
+Call "delete_one_address_conf()" with the hostname and the ConfItem*.
+
+Section 3.6: Reporting Entries
+------------------------------
+
+Call "report_dlines()", "report_exemptlines()", "report_Klines()", or
+"report_Ilines()" with the client pointer to report to. Note these walk
+the hash, which is inefficient, but these are not called often enough
+to justify the memory and maintenance clockcycles to for more efficient
+data structuring.