Document Security in Web Applications
{LANG_NAVORIGIN} Web Security
Andres Desa
03/21/2005
Organizations publish information online including confidential data. Data is rendered
in varied formats; it can vary from simple HTML pages to documents in Adobe's PDF
or Microsoft's Word/Excel formats. Confidential data is restricted to a set of users
who have to login and be authenticated on the website. A common example of such
a situation is an online banking system, wherein the personal statements of a
customer are made available in a PDF file. These files contain sensitive information
and as such they must not be made available to any other user. Mechanisms to
protect data rendered as HTML are well established, the same thing does not hold
good for document protection. The displaying of confidential data in documents
brings about the following issues:
- Access Protection
The documents should be protected so that only the authorized user can view
them and no other user can view the same documents. Example: A bank
customer should be the only person allowed to view his bank statement.
- Document Storage
The documents should not be stored in any format in any location that can be
accessed by other users. Example: The document should not be stored in
caches of browsers that can be later viewed by other users.
Initially we take a look at the traditional approaches that are widely followed for
document delivery and the inherent weaknesses in them. We will then discuss an
approach that mitigates most of these risks and provides a secure environment to
deliver confidential documents. The recommended solution for displaying documents
in a secure manner is to stream the document, have proper authentication and use
the "no-store" cache control directive.
Document Display - Traditional Approach
In this section, we discuss some of the common mechanisms used to display
documents and the corresponding risks.
Common user access
The approach that is seen widely in use is to place the protected documents in a
folder that is not accessible directly from the Internet. All the documents that need
to be accessed are stored in a folder that has READ permission given to a single user
account. In an IIS setup, this single user account is the IUSR_<
computer name>. Once the web user is authenticated, that user is mapped to the single user account
that has permission to read the documents in the folder and the specific document is
then displayed to the user. In such a case, since the documents are not stored in a
publicly accessible folder, a web spider tool[1] will not be able to access the
documents.
Risks
The risks that are involved in such an implementation are:
- An authenticated user can access the documents that belong to another user.
The user can guess (brute force) the names of the documents that belong to
other users and if he requests for them then the same shall be displayed to
him.
This can be explained with the help of an example.
Let's assume User A is an authenticated user and he has access to the
document at location http://org.name/usera.pdf. Another authenticated User
B has permission to view a document at location http://org.name/userb.pdf.
Now if User B requests for the document .usera.pdf., the same shall be
displayed to him. This is because the user account used to view the
documents is the same for all authenticated users.
- The documents that are viewed are also stored in the local cache of the
browser.
Direct URL access
An approach that is easy to implement, although very rarely seen in practice, is to
provide the user with the full path of the document. The document will be located in
an Internet accessible folder that has no permissions set on it. Once a user is
authenticated by the website, the user is allowed to access the documents. The full
path of the location of the document will be displayed to the user.
Risks
The risks that are involved in such an implementation are:
- A simple spidering of such a website will display all such documents that are
meant to be viewed only after authentication.
- Once a user knows the URL of the document he can access the same directly
by requesting for the document and need not login.
- The documents that are viewed are also stored in the local cache of the
browser.
Secure Document Delivery
The proposed solution for secure delivery of documents involves two steps
- Render the document after proper authentication
- Use secure cache control directives
Rendering a document
Rendering of documents proves a boon for displaying files in a secure manner. It
assists the secure display documents in two ways.
- File Path Protection
It allows the documents to be located in a non-publicly accessible folder and
the document path is not displayed to the end-user. When the user makes a
request to view the document, the browser does not request for the
document directly, instead; it requests for a script file. It is this script file that
will do the rendering of the document. The path of the document to be
rendered is made available to the script either by obtaining it from a database
or by hard-coding it in the script. The script file cannot be viewed by the enduser,
so the hard-coded document path will not be revealed to the user.
Hence the actual location of the document on the web server is not revealed
to the end-user.
- Authentication
It also provides a mechanism to ensure that authentication can be provided
for the document that is meant to be displayed. The authentication module is
added before the rendering of the documents. It should check for a valid user
login and should also verify that the user has access to the document being
requested.
After authentication the contents of the document must be "streamed" to the
browser. The streaming of a document is done by sending it as binary data. The
media type of the document has to be specified to the browser by setting the
appropriate CONTENT-TYPE header. This is to ensure that the client browser can
then display the document with the help of the appropriate plug-in. An example of
the use of the Content-type header is:
Content-type: application/msword
This header specification will be used to display a Microsoft Word document.
Even though the document is being rendered or streamed from the server, the
browser still stores the contents of the document in a temporary space, its local
cache, and then displays it from its cache. We need to protect documents from
being accessed from local cache and this can be done using cache control directives.
Secure Cache Control Directives
HTTP headers handle the cache control information. We need to set appropriate
headers so that the documents are not cached. In the current specification of HTTP
1.1, the CACHE-CONTROL header provides a wide range of directives that allow the
control of the browser caching. Listed below are the few ones that can be made use
of in the current solution.
No-cache: This directive tells the browser that it has to request the document from
the server and not use the copy of it from its local cache.
No-store: This directive is to ensure that the document is not stored persistently
either by remote or local caches.
How to use cache control directives?
Use of cache control directives is easier said than done. The browsers like Internet
Explorer (IE) and Mozilla have different implementations of the cache control
directives. There are also bugs in certain scenarios when cache control directives are
set. Some of these issues are highlighted in the section below and we summarize the
information in a table at the end.
Cache control in Mozilla
On a HTTP connection, if the documents are rendered to the browser, Mozilla will
cache the documents even if they have the "no-cache" directive set. Whereas if the
"no-store" directive is set, then the document is not stored in its local cache.
Over a HTTPS connection, the Mozilla browser does not cache any pages by default.
Cache control in Internet Explorer
Internet Explorer does not cache the rendered document on a HTTP connection with
either of the "no-cache" or "no-store" directives set. On a HTTPS connection with the
"no-cache" directive set, IE instead of rendering the document tries to download the
main page. This can be seen in the figure shown below. Selecting either Open or
Save gives an error as shown in the figure below. This error is documented in the
Microsoft Knowledge Base as article ID 316431. The "no-store" directive on a HTTPS
connection overcomes this issue and also does not store the document in the cache
of the browser.
Cache control Implementation in Browsers
The following table lists the caching status for various browsers for the different
cache control directives:
What is the optimum use of cache control directives?
From a security standpoint, it is safe not to have the browser cache the document
contents. Looking at the above table, an implementation to display documents
securely would be to use the "no-store" directive over an HTTPS connection. All
browsers supporting HTTP 1.1 will support this directive.
Note: All browsers that were released after the year 1996 are HTTP 1.1 compliant. This
includes browsers like Netscape Navigator 2.0 onwards and Internet Explorer 3.0
onwards. Web servers like Apache HTTP Server 1.2 onwards, Microsoft's IIS 4.0
onwards and Netscape Enterprise server 3.0 onwards are HTTP 1.1 compliant.
The "no-store" directive is not available in HTTP 1.0 specifications.
Sample Implementation
The model code for the solution discussed is presented below in ASP code. The same
can be implemented in other languages. The code has to send the requested file as
binary data, set the cache control directives to "no-store", set the content-type
header to the appropriate type and provide a module for authentication.
The reading of binary files from the server’s file system through ASP and then
sending the content to the client's browser can be achieved with the help of the
ADODB.Stream object and the BinaryWrite method from the ASP Response object.
The ASP Response object also has a method to send the Cache-Control directives
and the Content Type directives.
START CODE
END CODE
References
- HTTP 1.1 RFC http://www.ietf.org/rfc/rfc2616.txt
- How to read and display binary data in ASP http://support.microsoft.com/default.aspx?scid=kb;en-us;193998
- Internet Explorer Is Unable to Open Office Documents from an SSL Web Site http://support.microsoft.com/default.aspx?scid=kb;en-us;316431
- SSL Caching in Mozilla http://www.mozilla.org/docs/netlib/cachefaq.html
- HTTP Caching in Mozilla http://www.mozilla.org/projects/netlib/http/http-caching-faq.html
Footnotes
1 A web spider is a program that automatically and recursively follows the hypertext links on a Web site. Most of the search engines on the Web make use of web spiders to gather information regarding Web sites.
E-Mail Link
Your IP address will be sent with this e-mail