Comprehensive logging and monitoring is key to being able to detect what your site is doing at runtime. If you do not implement it correctly, you will be unable to detect when your system is under attack, and you are liable to get compromised without knowing it.
Logging in Code
Modern programming languages come with logging packages that allow you to add lines of text to a log file as the application is running. Taking advantage of these packages will give you a papertrail of what your application is doing as users interact with it. Smart use of logging will allow you to spot bugs in your code, diagnose incorrect behavior in your application, troubleshoot user issuers, and raise alerts when cyber-attacks occur.
Key Elements in Log Statements
Each entry written to a log file should contain a timestamp, the log message, and an indication of where in the code the entry is being written from - in other words, the code file and line number. Depending on what type of functionality the code is handling, it is also useful to include some of the following:
- The server name if more than one server is writing to the log file.
- The URL, HTTP status code and incoming IP address if the code is handling an incoming HTTP request.
- The username if the code is being executed in response to an action by a logged-in user.
- Timing information if the code is executing a time-sensitive action.
- Diagnostic information - for instance, if any action had to re-tried, or if a component is responding slowly.
- Error messages and stack traces if an error occurs.
What Events to Log
You should log every HTTP request and the corresponding response, being sure to include the URL, the HTTP response code, and the time taken to service the request. You should also log any significant actions performed by you web application, including:
- Input validation failures, e.g. when the code encounters unexpected parameter names or values.
- Authentication successes and failures.
- Authorization (access control) failures.
- Session management failures, e.g. when session cookies are rejected as invalid.
- Application errors and system events.
- Startup and shutdown events - including timing information!
- User events like sign-ups, password changes and account deletions.
- Administrative events, e.g. when permissions are changed by an admin.
- Calls to third-party services or APIs.
What Not to Log
Writing sensitive information to your logs is a security risk - imagine what an attacker could do if they stole your log files! Make sure your logging statements never contain any of the following:
- User or system passwords.
- Encryption keys.
- Database connection strings.
- API keys for third-party services.
- Personally identifiable information about users.
- Payment information like credit card numbers.
- Sensitive HTTP headers, such as
- Session IDs or session cookies.
- Access tokens - for example, those used during sign-ups or password resets.
- Information a user has opted out of collection. The “right to be forgotten” is law in many parts of the world, and this includes data in log files.
By convention, most logging packages allow you to mark logging statements with
one of at least four “log levels” - typically named
ERROR, in order of importance. You should mark each logging statement with
an appropriate log level to make it easy to filter out the noise in log files.
Servers can be configured so only statements marked with a particular log level
before being written to the log file: a server configured with the
log-level will write
ERROR messages to the log file,
Low-level diagnostic events should be marked as
DEBUG statements, and
typically only show up in non-production environments. The
INFO log level
should be be use for normal running: the events expected to occur when users
interact with your website. Unexpected events should be logged with the
WARNING level, and errors should be marked with the
ERROR log level.
Other Types of Logging
In addition to the logging statements added to your code, other applications in your stack will typically output log files. A web server like Apache, Nginx or IIS will log HTTP request and response information. Databases will typically write log files too, which are useful for diagnosing performance issues. Make sure your team has access to these logs too when diagnosing issues.
Log Aggregation and Storage
Logs should be centralized and stored securely so they can viewed by administrators. “Log servers” like LogStash, Graylog, Splunk and PaperTrail will aggregate log files from different sources and allow them to be searched and analyzed in realtime. Log servers are often available as services, and are can be added as plug-ins if you run on cloud-based services.
Logging should be paired with monitoring, the process of continuously assessing whether your site is running as expected. Monitoring software often pulls key “metrics” from log files to diagnose the health of a web application. Let’s discuss the various ways you can monitor your application.
The most basic form of monitoring is checking that your site is available. “Uptime monitors” like Uptime Robot and Pingdom are free services that will check whether a URL is responding to HTTP requests successfully. (More complex or frequent checks are usually available as a paid option.) This is a simple way to achieve peace-of-mind: if your site ever becomes unavailable, the monitoring software will send an email or text alert to your team.
Capturing performance metrics will give you granular information about the state of your web application. Your monitoring should keep track of:
- Responsiveness: how long it takes to respond to each web request.
- Throughput: how many requests per second are hitting your site.
- Memory usage and server load: how much of each server’s memory space is being used up, and how close each server is to full capacity.
- Database performance: how many queries are being run per second and how many concurrent connections there are.
You should configure your monitoring software to alert your team when unusual errors occur or when performance metrics reach critical conditions. Alerts can be sent over email, instant messaging or text. Large organizations will have support rotas, so at least one engineer is available to respond to alerts at all times.
Being alerted to errors is only useful if there are actions your support engineers can undertake to fix the error condition! Make sure you put together a trouble-shooting plan for a responding engineer to follow, and keep it up to date as you develop your system.
Troubleshooting steps may consist of restarting servers, adding extra servers in times of high load, blacklisting IP addresses when you see malicious traffic, or escalating to team specialists (like network engineers and database administrators) when particular parts of the application appear to be the problem. If the issue starts to affect users, making an announcement on a status page or in a website banner will earn you goodwill.