Advertisement

File upload server

Started by March 13, 2015 12:11 AM
2 comments, last by Zao 9 years, 8 months ago

My good fellow gamedevers. I need your help once again!

I am thinking of writing a file server that can accept file uploads, authenticate the uploader, and stream the bytes directly to S3. I thought it was going to be quick, but somehow I am having problem finding the solution.

I was expecting there's going to be a whole range of solutions and apps out there that can do the job, but Googling here and there doesn't seem to resolve to any particular, convincing solution, and I am hesitant to write my own.

My use case:

1. Authenticate user before accepting file upload. There will be a token in the Authorization header. File server should authenticate the token with the auth server before accepting file upload. HTTP 100-Continue seems to be at play here.

2. Once it's authenticated, stream the bytes directly to S3. Depending on Content-Type, put files into different S3 buckets. At this point, there's no association/ownership between users and files on S3. File ownership is described at the DB.

3. This is completely optional: show some progress bar to the users. This took me to this awesome page, which describes the protocol of resumable uploads and displaying progress bar, which means that a custom homegrown solution is needed.

First, I ran into nginx upload module. Seems to do the job just fine, but it can only seem to upload directly to the disk, then I have to write another app that pulls that file from disk, and stream to S3. What about the authentication beforehand? Unless I am mistaken, this doesn't sound to be a good long term solution.

Then ran into various blogs of people's endeavors using Passenger/Ruby/Python/PHP/node.js/nginx/jQuery and whatever stuff, and the solutions seem to be very "heavy" on getting the proper values to your config files. Just no. Some of them are at the hobbyist level, loading the entire file in memory and save them on disks, which is what the nginx module has already done.

This should have been done already. File upload is like an ancient technology of the Atlanteans. There has to be the one true way of doing this, but the answer seems very elusive now, or maybe I am searching for the wrong thing. My search result did get mixed up with these free file hosting servers ads.

It does have an 'ancient' solution: ftp.

If you want a secure version: sftp.

If you're doing it on the server end, use s3fs and configure an FTP server to point to it.

If you want the client to do it, look at products like Cyberduck that can handle S3 on the client end of file transfer.

Advertisement
100-continue is not required, necessary, or a good idea for authentication.

HTTP and HTTPS has a built-in authentication mechanism, called "HTTP Basic Auth." It consists of the client sending the base64 of a user name and password. Because this is close to clear text, you typically want to use HTTPS for this. Web browsers and web servers have this built in.

Once authenticated to a site, HTTP Basic Auth will include the header in each request, so you can tell that the same browser/user is making subsequent requests. NGINX, Apache, and most other HTTP servers support this method out-of-the-box. Some of them can even integrate with LDAP for the name/password database.

Other options of doing HTTP Authentication include a login form, some application logic, and setting a Cookie header with some kind of Session ID.

Regarding streaming into S3, because you want to authenticate users, and select a bucket based on file type (which requires looking at the headers, and perhaps the payload,) you can't stream straight into S3. The typical way this is done is uploading to a temp file on local disk, and then transferring to storage once done and complete. The local disk may be the /tmp directory of a server, or it may be some filer system, or it may even be some kind of reliable workflow/message queue.

If you don't want to hold on to the files locally, then you need to do the HTTP protocol streaming bit yourself, in some kind of loop that reads a chunk, writes a chunk, after the headers are received and parsed. This is not hard, but it does require extra custom code. This is easier if you use a PUT request, and harder if you use browser-form-based POST requests with multipart encoding.

I think the expectation that something already exists that accepts file uploads, AND implements your custom logic of user authentication, AND implements your particular back-end storage of S3, is somewhat optimistic. When you have specific, custom requirements, you often need to write some specific, custom code. The good news is that most populare web frameworks (Anything from J2EE to Webmachine to Node.js to PHP to Warp) come with the components that you need to assemble to make this work.
enum Bool { True, False, FileNotFound };

In the distributed storage software we use, clients authenticate with client certificates over TLS HTTP with the head nodes, which may either proxy the data to the storage pools or redirect the client to issue its PUT to the storage pools over unauthenticated HTTP.

The mechanism of redirect-on-PUT requires that the client _must_ issue a "expect 100 continue" on its initial request. This is so that the server will have the ability to decide on if to accept the data directly (by responding with a 100 continue), or redirect/reject/whatever with 30x/40x.

Our storage uses dynamically opened ports on the storage nodes and have a GUID as query parameter to figure out where to put stuff, but it's in no way secure. In your case, you'd either have to delegate credentials to the clients as part of the URLs, or make your S3 world-writable and pray.

To make it is hell. To fail is divine.

This topic is closed to new replies.

Advertisement