If you're a test pilot, pushing the envelope is what you do to discover an aircraft's performance limits. I'm just a private pilot, therefore I try to stay away from the boundaries of my aircraft's envelope as much as possible. But there are other envelopes I can play with, and Office 2.0 is one of them. Over the past two years, I have conducted many experiments, attempting to do everything online. I followed the Rules for Office 2.0 religiously, and eventually made it work. Today, using an iPhone and a MacBook Air equipped with a Franklin CDU-680 3G modem, I can get online pretty much anytime and anywhere. I built an Office 2.0 Setup around Gmail and Salesforce.com, and I must say that it's working pretty well. So, what's next? Well, all this works when you're managing megabytes of structured data and gigabytes of documents. But what about hundreds of gigabytes of structured data and tens of terabytes of documents? Can you make the overall workflow scale by three or four orders of magnitude? This is one of the questions I will try to answer over the coming months, and I will make sure to share my progress on this blog.
As of today, only large corporations, online service providers, and video production companies need to manage terabytes of digital assets. But with the adoption of digital SLR cameras and digital camcorders by the general public, it's only a matter of time until your average user needs to handle several terabytes of data on a regular basis. Hard drives are cheap. You can buy a 1TB Hitachi Deskstar 7K1000 for less than $300 today. But over the past few years, the price/capacity ratio for hard drives dropped faster than the price/bandwidth ratio for Internet connections. As a result, moving large amounts of digital assets online is becoming increasingly difficult. Also, dealing with digital assets such as movies stored on DVDs and ripping them (for your personal entertainment only) is a fairly complex process, requiring the use of multiple tools and powerful desktop computers. Today, we know how to deal with gigabytes of data. If I want to share pictures with my friends, I can simply upload them onto Flickr, or attach them to an email. I can also use countless online services for backing them up. But I cannot do that for my movie collection, and this is the problem I'm trying to solve, or at least learn a thing or two about.
To support my little experiment, I am building an oversized desktop computer. It does not look anything like what people will use two or three years down the road to store a dozen terabytes of home videos, but it does not really matter. By then, Internet connections won't be radically faster, and the workflow problems I am facing today will be faced by most users tomorrow. The main requirement for this computer is to be able to store tens of terabytes of data, so the first component I selected was the enclosure, the solid Chenbro RM51924B, which provides 24 hot-swappable hard drive bays, and two internal ones used for the operating system and software files.
To control 24 hard drives, I opted for the Areca ARC-1280ML, which provides six Mini SAS ports, each capable of connecting to four SATA II drives. This controller natively supports RAID 6, which allows two drives to fail simultaneously without losing any data, by using two drives for parity. As a result, I get 22TB of addressable storage space by using 24 Hitachi Deskstar 7K1000 1TB drives.
For the motherboard, I went for the ASUS DSEB-DG, but the upcoming ASUS Z7S WS would be my top choice if I were to do it again (better connectivity). It is home for two Intel Quad-Core Xeon X5450 3GHz processors, and a couple of Crucial 240-pin DIMM 4GB memory modules, providing a total of 8GB of RAM. As for the graphics controller, I went for the NVIDIA Quadro FX 5600G, but the upcoming NVIDIA GeForce 9800 GX2 would not have been a bad option either. Just don't ask me why I needed such a powerful GPU, for I'm still trying to figure it out myself...
Then came the question of the operating sytem. Since I am mainly using computers running Mac OS X on a daily basis, I wanted to discover what was available out there, and decided to go for Windows Vista. I picked the Ultimate edition, using the 64-bit version in order to be able to use more than 4GB of RAM (the ASUS DSEB-DG supports up to 64GB, just in case you were wondering).
What I did not realize is that such a system would require a massive power supply, as well as quite a few fans to dissipate the heat created by about 1000 Watt being consumed while running at full speed. And the fans that come with my enclosure are not exactly quiet. The Chenbro RM51924B is a 5U rackmount unit that is designed to be deployed in a data center, where workers typically wear ear plugs. While power consumption and heat dissipation are critical factors when designing server hardware components, noise level is not, and I learned this the hard way the first time I turned the whole system on. Since I wanted to leave it on my desktop (or next to it), I quickly looked for a fix that would prevent early loss of hearing.
The first solution I tried was to replace the stock fans that came with the enclosure with more silent ones provided by Acousti, Noctua, and Silenx. It did the trick for the main enclosure, but not for the power supplies (four redundant units), which are using tiny fans running at a very fast RPM rate. I could have replaced them with more silent ones, but I was concerned that the airflow would not be sufficient, leading to overheating, and eventually burning of the power supplies. Not a good thing obviously.
The second (and last) solution I went for was to use a quiet external power supply (Cooler Master Real Power M1000) and go for a liquid cooling system, of which I received the last parts this morning. It is built around the Zalman Reserator XT unit, using CPU, GPU, chipset, and memory cooling blocks from D-TEK, Koolance, Thermaltake, and Zalman, as well as a whole bunch of fitting accessories sourced from FrocenZPU.com. The Zalman Reserator XT is gorgeous, but provides a fairly limited heat dissipation capacity, due to a limited flow of coolant (distilled water with anti-corrosion additives). As a result, I might have to use two or three units in parallel. I expect to be done with the overall setup within a couple of weeks. In the meantime, I am running the system using silent fans, and leaving the enclosure's cover open for improved heat dissipation. Somehow, my office became quite hot at the end of the day lately...
Once I'm done, the system should be very quiet (less than 25dB), relatively cool (less than 30 degree Celsius), and quite powerful (8 CPU cores, 8GB to 32GB of RAM, 22TB of RAID 6 addressable storage). All with nice graphics and sounds. Now, let's talk about what I'm going to do with this monster of a computer. The overall idea is to experiment with user workflows for managing large amounts of digital assets. Creating, sending, sharing, backing-up, and archiving terabytes of data will be the main activities I will focus on, and for automating the end-to-end process, I will use no other tool than Intalio. The reason for it is pretty simple: we just released version 5.0 of this enterprise-class BPMS, and for the very first time it's easy enough to use that I can develop applications with it myself. I will define a user scenario (creating a library of 1,000 movies), attend one of our training sessions, install the software of my new machine (called UltraVista), model digital asset management processes using Intalio|Designer, integrate them with my local DVD ripping tools using the Microsoft .NET Framework, and deploy them on Intalio|Server. Some processes will be deployed locally, while others will be deployed on the upcoming Intalio|Server On Demand powered by Amazon EC2 (to be announced later this month). The overall project will be documented through regular posts.
The movie library will include 500 classic movies from the Criterion Collection, the Eclipse Collection, and Janus Films, plus 500 contemporary movies and documentaries, with a strong focus on alternative production companies such as Magnolia Pictures or Participant Media. The scenario will be to control a robotic DVD drive like Ripfactory's Ripstation Lite Pro for ripping batches of up to 25 DVDs at a time, rip them in a lossless fashion (up to 8.5GB per DVD), convert them into MP4 format while selecting the right audio track (original soundtrack) and the appropriate subtitles for English speaking French people (no subtitles when the original soundtrack is in French), upload a copy of the MP4 files onto a private Amazon S3 account, and upload a backup copy of the original files onto an online storage service that offers very large storage capacities (hopefully Egnyte). To make it even more interesting, the scenario will also include the automated fetching and semi-automated formatting of DVD covers, using sites such as Cdcovers.cc, and image processing tools such as GIMP or Adobe Photoshop. Images will be stored using Alfresco Image Management, and images that need manual editing will lead to the creation of workflow tasks using Intalio|Workflow. Ideally, these tasks will be created into Salesforce.com, so that I can use it as primary end user interface for managing my workflow. Progress in the overall ripping, conversion, and editing process will be reported on a public website using the upcoming Intalio|BAM component, for which custom RSS feeds will be developed as well.
Ultimately, the entire process will be released under an open source license on the Intalio Community Website. Along the way, I will report on my experience learning how to use Intalio with as many details as possible, share the good, the bad, and the ugly (like I did for the last Office 2.0 Conference), and make recommendations for improvements at the product level. Hopefully, this little experiment should have a positive impact on the product, and might inspire some of you to come up with interesting use cases as well. Wish me luck!
PS: Actual pictures of UltraVista coming soon...
Link to original post