Thursday, December 31, 2009

Suggest some price oriented setups for a large video hosting site

SkyHi @ Thursday, December 31, 2009
  1  vote down  star

Im looking to build a video hosting service from scratch. I have access to several traffic channels, so there will be immediate usage of such service. I wanted to see what sort of options there are out there for doing this.

The function of the site is simple.

User uploads a video in any video format. Video converted to mp4 and 3gpp formats, after which point its available to be streamed via a private (or public) url.

I estimate there to be several terabytes of data within the first 4 months, with over 3 gigabits of bandwidth usage.

Speed is key, I want videos to load quickly, but I also dont want to spend a fortune getting $20,000 servers.

Can you guys recommend a simple, solution that can be scaled from 1 to 100 servers.


What I originally thought about is this:

    * 1 front end server for the DB and mysql (mysql then could be moved to its own server)
    * 1 conversion server
    * 1 media server

The problem that "scares" me, is a single server will have about 500-800GB of space (15k SAS drives in raid5). Once thats exceeded... adding new servers and just keeping the ID of the server in the master file list isn't a big deal.... but after some time this will be very inefficient... since once a server is filled up, and files get old, fewer people will be accessing them... so at that point it becomes an over-powered archive server, which is a waste. I want to avoid this problem.


Ah, our favourite* kind of question - you want a new system to be fast, reliable, flexible and cheapo too - we only get this think of post every few days ;)

I've been doing consumer-oriented video streaming for over four years now and you can't have it all, somewhere you're going to have to compromise. That said based on the assumption that you have no QoS or client-SLA (time to session start, dropped frames etc.) to worry about it sounds like you're going to need the following ;

  • 2 x medium/heavy-duty firewalls/load-balancers/routers or combination devices for the client-facing stuff.
  • 2 x light-duty combo devices (with multi-VPN capability) for the incoming services.
  • At least 2 high-CPU, low-memory, low-local-storage servers to do your VoD encoding.
  • Between 2 and 4 (initially) medium-CPU, high'ish-memory, low-local-storage VoD streamers.
  • A couple of dedicated beefy DB servers.
  • As many front-end application servers as you think you'll need to create your 'catalogue' pages, checking any entitlement requirements pre-play and to generate the urls you'll be passing to your clients for playback from the VoD streamers.
  • At least 2 web servers for serving the catalogue and handing over the urls to the clients.
  • Presumably some form of MIS system to monitor service and report on activity for commercial and SLA adherence reasons.
  • Two, dual-controller, SAN/NAS boxes; one 'internal' for the DBs, VoD importing process, any VMs, build code, dev and any other safe storage; and a second just for holding your post-processed VoD content that lives with the steamer servers. Mixing these two functions will cost you somewhere down the line, split them now.

Personally I prefer FC over anything ethernet based as it grows much more predictably but budgets may play a part; if you have to do NAS use NFS not CIFS/SMB, it's so much more mature. Oh and multi-path where you can too.

Of these you want to make sure your DB boxes are 'solid' and that you have your network sorted first and foremost, everything else is fairly easy to deal with. Don't go silly buying 10Gbps NICs for your VoD servers just yet, 3Gbps of concurrent traffic is actually very low and you can pretty much guarantee that even the worst-configured modern server will hit 1Gbps of streaming 24/7, so simply get a bunch of cheapo ones so you have hardware resilience for when they break.

I'd be strongly tempted to use blades for all and VMWare for all except the DB and VoD servers but you may not have the budget, certainly those two technologies can significantly help with smooth future expansions but their entry cost aren't cheap.