This Internet Gopher client written in the C programming language indexes directories and files. It is implemented in accordance with RFC 1436. The overall design is to ensure that outbound requests strictly adhere to the protocol standard whilst handling inbound responses as gracefully as possible.
The C programming language is used because of its extensive use in native Linux programs and its efficiency relative to alternatives (e.g., Python, Java). A standard Linux installation should be able to execute the program; no additional packages are required.
The program is intended for Linux machines with GCC installed. The standard
C library is sufficient for compilation. Use the command make to compile the
program with the source code. Simply execute the command
./client <hostname> <port> to run the program. For example,
if the Gopher server is hosted locally at port 70, ./client localhost 70
connects to it. An IP address can also be directly used. As an example, the
command ./client 127.0.0.1 70 is also valid.
For local testing, a local Gopher server can be started using
Motsognir with the command
sudo motsognir. The listening port and the process can be listed using the
command sudo lsof -i :70.
The overall design relies on performing a breadth-first search (BFS) of the
filesystem hosted on the Gopher server. The indexed files and directories are
recorded in a linked list implemented using the entry struct. Each entry
contains the type of the record (item_type), the record itself (record).
The pointer next directs to the next entry in the linked list.
There are six types of records.
Entry type (item_type) |
Record (record) |
|---|---|
DIRECTORY |
Pathname to the directory |
TEXT |
Pathname to the text file |
BINARY |
Pathname to the binary file |
ERROR |
Invalid request line |
EXTERNAL |
Request line referencing an external server |
TIMEOUT |
Pathname to the file taking excessive time to download |
TOO_LARGE |
Pathname to the file exceeding the size limit |
Since the protocol is stateless and the connection is terminated once the server
has responded, the function gopher_connect() establishes and closes a
connection every time a new request is made. The function gopher_connect()
appends "\r\n" to the request which is passed as the parameter
request. It sends the request line to the server. The function func(),
passed as another argument to gopher_connect(), is responsible for receiving
the response from the server.
Instantiations of func() include indexing(), evaluate_file_size() and
print_response(). With reference to the protocol guidelines, the client
listens for a response until the server terminates the connection. Therefore,
these three functions use recv() provided by the Socket API in a while loop.
The built-in timeval struct and the errno library in C are utilised for
handling timeouts and failed responses. Notice that a failure to receive a
response is handled differently from an empty string response from the server.
As long as the server adheres to the protocol's format, its response to a
pathname is always the directory index with the first character of a row
indicating the nature of the file. For instance, 0 indicates that the file
is a (non-binary) text file whereas 1 refers to a subdirectory or an external
server. The function indexing() uses this information to determine the type
of the file/directory. A new entry is added to the linked list, including the
item's pathname and type.
If the request is invalid, the corresponding response begins with 3. In this
case, a new entry is added to the linked list if and only if it does not already
exist. This enables us to count the number of invalid references at a later
stage.
With reference to RFC 1436, the canonical type 9 refers to binary files.
Actual server implementations often have more specific types and non-canonical
types that gained popularity after RFC 1436 was published.
In this client program, files that are not in plain text are considered binary.
| Classification | File types |
|---|---|
| Directory | 1 |
| Reference to external server | 1 |
| Text file | 0 |
| Error | 3 |
| Binary file | 4 (BinHex-encoded file), 5 (DOS file), 6 (uuencoded file), 9 (Binary file), g (GIF), I (Image), : (Bitmap), ; (Movie), < (Audio), d (Document), h (HTML), p (PNG), r (RTF), s (Sound), P (PDF), X (XML) |
Any row in the response starting with the character i is a human-readable
informational message, thus ignored by the indexation process. Other references
such as Telnet (8 and T), CCSO nameserver (2) and mirror (+) do not fall
within any of the categories above. These types are therefore disregarded by
the indexation process.
Following the indexation of the root directory using the request "\r\n",
main() goes through the linked list and calls gopher_connect() for every
subdirectory. This effectively models a breadth-first search of the filesystem
hosted on the Gopher server.
Upon indexation of all directories and files in the filesystem, evaluate() is
called to find the following information:
- Number of directories, text files and binary files
- Content of the smallest text file (if multiple exist, the first one indexed)
- Sizes of the smallest and the largest text files
- Sizes of the smallest and the largest binary files
- Number of invalid references (each unique invalid request is counted once)
- List of external server references (hostname and port) and their connectivity
- List of references causing issues/errors
Note that if the server ends its response with .\r\n as per the protocol
standard, the three characters are included in the size counts. This helps
with determining whether the server is replying with proper responses.
The helper function test_external_servers() checks the connectivity to
external servers by attempting to establish a connection. A server is
considered "up" if it accepts the connection in a few seconds. Otherwise, it is
marked as "down".
As the client program runs, logs are printed to the terminal (stdout and
stderr). The terminal outputs to stdout are in the following formats.
Request sent at <timestamp>: <request>Indexed <item type>: <pathname of newly indexed item>Number of <directories/text files/binary files/invalid references>: <number>Size of the <smallest/largest text/binary file>: <number>- Other information produced by
evaluate()
Issues and errors are printed to the terminal through stderr:
Usage: <program name> <hostname> <port>Error: <error message>File too large: <pathname>Transmission timeout: <pathname>
Whilst this client program follows the Gopher protocol standard in RFC 1436,
measures are taken to reduce the impact of malformed responses from servers.
For instance, the response is written to a string variable
and is always explicitly terminated with the null terminator (\0). This
avoids problems such as out-of-bound reading/writing.
Safer built-in functions in C (such as strncpy() rather than strcpy()) are
used to avoid unexpected behaviour that might possibly be the result of long
requests or responses. Indexed records in the linked list, with memory allocated
using malloc(), are cleaned up by cleanup() before main() terminates.
At the beginning of gopher_connect(), the program is terminated if the
connection cannot be established. An error message is printed to stderr
specifying the issue.
A correctly implemented server should respond with some content under all circumstances. "Empty response from the server" is printed to the terminal if the server terminates the connection without any response. This clarifies to the user whether the connection is terminated or the client is still waiting for a response.
Servers may be busy or unresponsive at times. There are two types of timeouts.
- The server accepts the connection but does not respond to a request.
- The server accepts the connection and responds but it takes too much time.
- The server does not accept the connection within the time limit.
Timeouts prevent the program from getting stuck indefinitely.
The first situation is managed using setsockopt(), a built-in feature in the
Socket API, and the timeval struct. The configuration is set by
gopher_connect(). The limit is currently set as 10 seconds.
The second situation is handled using select() and fcntl(). The
configuration is set by the function responsible for receiving the response.
The limit is currently defined as 5 seconds.
The third situation applies in test_external_servers() with a 5-second limit.
The built-in recv() function reads the responses from the server. The server
might send it with multiple packets. In indexing(), for example, a while loop
is used to read the response from the server and write it to the buffer until
the server disconnects.
We set BUFFER_SIZE is 65536 bytes as this is a reasonable size limit for most
directory indices on a Gopher server. It can be adjusted if the user of this
program has specific needs.
A well-implemented server's directory index response should include four pieces of information, separated by tab characters. In case a line therein is malformed, the program handles it gracefully to avoid indexing invalid things. It is also designed to handle long responses from the server.
Motsognir, for example, ensures that every line including errors and informational messages includes three tab characters as separators.
3error fake fake 0
iInformational fake fake 0
Other servers format these responses differently. The class server, for
instance, uses invalid rather than fake.
3error invalid 0
iInformation invalid 0
The client program ensures correct handling of responses when interacting with different servers.
A server's response should end with a line containing ".\r\n" followed by the
null terminator. However, if the server does not follow the standard protocol
to the full extent and terminates transmission without ".\r\n", the program
still handles the response gracefully. The client program perceives a server's
termination of the connection as the end of data transmission. An example is
/misc/malformed2 on the class server.
Unicode was not published at the time the Gopher protocol was publicised. We assume that pathnames contain only ASCII characters, or else converted into ASCII representations on the server side (as it is in Motsognir). For example, a file named "ɡoʊfər" is detected as "%C9%A1o%CA%8Af%C9%99r".
Some malformed responses omit the hostname and the port in an entry of the
directory index. Examples include /misc/malformed1. This client still attempts
to index the file, assuming it exists on the server. If the file does
not exist in the server, the pathname is regarded as an invalid reference.
As part of the evaluate() function's execution, the client has to download
the files for analysis of their properties, such as their size. The files may
sometimes be unexpectedly large. The global constant FILE_LIMIT specifies the
maximum file size that the client may accept. This ensures that the client will
not be flooded with infinite/unacceptable network traffic, of which malicious
parties may take advantage. Files exceeding the size limit are not considered
in the statistics (e.g., sizes of largest files) evaluated by evaluate().
Currently, FILE_LIMIT is set as 2^17 (131,072) bytes, considering that most files on
comp3310.ddns.net do not exceed this limit. In realistic situations, users
of this client should know the size scale of files that the target server
hosts. This global constant can be adjusted accordingly. Files exceeding the
file limit are logged and printed before the client program execution ends.
The client program is tested with the class server and a local Motsognir server. The terminal outputs are included for reference:
The program is also tested with gopher.floodgap.com at port 70. The first 1024
lines of terminal output are included in
assets/output3.txt for information.
Wireshark is used for monitoring the network traffic. The initial request is detected as "Request: [Directory List]". There are occasional retransmissions of TCP packets and spurious acknowledgement (ACK) signals.
In the screenshot below, the first three rows show the establishment of the TCP connection. The fourth row is the Gopher request for the root directory index. The sixth and eighth rows correspond to the response from the server, delivered with two packets. For each TCP packet received, the client sends an ACK signal to the server.
- Anklesaria et al. (March 1993). RFC 1436: The Internet Gopher Protocol. The RFC Series. https://www.rfc-editor.org/rfc/rfc1436
- Arkaitz Jimenez. (September 2009). Time stamp in the C programming language. Stack Overflow. https://stackoverflow.com/a/1444440
- Basile Starynkevitch. (September 2015). How to use
select()on sockets properly?. Stack Overflow. https://stackoverflow.com/a/32711642 - caf. (April 2010). C: socket connection timeout. Stack Overflow. https://stackoverflow.com/a/2597774
- dgookin. (July 2020). The
gettimeofday()Function. C For Dummies Blog. https://c-for-dummies.com/blog/?p=4236 - dhanshreekulkarni21. (March 2023).
strcat()in C. GeeksforGeeks. https://www.geeksforgeeks.org/strcat-in-c/ - Frédéric Hamidi. (October 2010). What does
EAGAINmean?. Stack Overflow. https://stackoverflow.com/a/4058377 - Foobar. (Oct 2018). Client in C++, use
gethostbynameorgetaddrinfo. Stack Overflow. https://stackoverflow.com/q/52727565 - IEEE and The Open Group. (2003).
setsockopt. Linux man page. https://linux.die.net/man/3/setsockopt - Milan. (Mar 2011). How to convert string to IP address and vice versa. Stack Overflow. https://stackoverflow.com/a/5328184
- Tomek Szpakowicz. (December 2009). When to use static keyword before global variables?. Stack Overflow. https://stackoverflow.com/a/1856642
- Wikipedia contributors. Gopher (protocol). Wikipedia. https://en.wikipedia.org/wiki/Gopher_(protocol)
Author: Alvin Tang dev@alvinylt.net
Last update: 2024-04-18
