Web Technology
Webinar: Innovative Tools Being Used in the Cloud to Improve Security
Joyent Launches the Joyent NetScaler VPX Accelerator
Webinar: Best Practices for Deploying and Scaling MySQL in the Joyent Cloud
The “Cloud” is supposed to be better than the “Real”
Benchmarking Joyent Pricing
Is the customer always right? No.
Phones to Challenge iPhone
On Benchmarking Databases: MySQL on Joyent versus AWS (part 1)
Joyent Sells Strongspace and Bingodisk to ExpanDrive
Why Netbooks are Deadly for Apple and Microsoft
Ankoder - Video Encoding on Demand
Rex Chung wrote to tell me that his EC2-powered Ankoder site can now segment files for iPhone HTTP streaming. As noted in the iPhone Dev Center, HTTP streaming obviates the need for specialized servers, extra firewall entries, and othe complexity. It supports live and video on demand sessions.
The Ankoder blog contains a complete walkthrough of the encoding and segmenting process.
The process proceeds as follows:
- Enter URLs for notification, external file storage (S3, FTP, and SFTP are supported), and thumbnail storage. Files stored in S3 can be marked as publicly viewable with a checkbox.
- Choose the input format, bit rate, and screen size. Presets are provided for common cases such as "iphone streaming 500 kbit/second."
- Set up the video segmentation. You can choose to start from the beginning of the input file or skip ahead any number of seconds, and you can decide how many seconds of video should be included in each segment.
- Define the output format or formats. You can encode to multiple formats at once to reduce bandwidth costs.
- Upload a file from your browser or use the Ankoder API to do it programmatically.
- Await notification that the work has been done. Ankoder uses EC2 High CPU medium instances and starts additional servers on demand to ensure that the maximum wait time is just 10 minutes.
Pricing is determined by the amount of upload and download bandwidth consumed during the encoding process. Bandwidth is charged at $0.002 per megabyte ($2.00 per gigabyte). You can transcode 100 50 megabyte videos into 3 formats for about $25.00.
-- Jeff;
Lower Prices for EC2 Windows Instances using Authentication Services
We've removed the distinction between Amazon EC2 running Windows and Amazon EC2 running Windows with Authentication Services, allowing all of our Windows instances to make use of Authentication Services such as LDAP, RADIUS, and Kerberos. With this change, any Windows instance can host a Domain Controller or join an existing domain. File sharing services such as SMB between instances will now automatically default to SMB-over-TCP in all cases, and will also be able to negotiate more secure authentication.
Existing Windows with Authentication Services instances will now be charged the same price as Windows instances, a savings of 50% on the hourly rate. All newly launched instances will be charged the new, lower price (starting at 12.5 cents per hour for a 32-bit instance in the US). Applications requiring logins can now be run on the Amazon EC2 running Windows AMIs.
As a result of these changes, our Windows AMI lineup now looks like this:
-
US:
- Amazon EC2 running Windows (32 bit) - English.
- Amazon EC2 running Windows (64 bit) - English.
- Amazon EC2 running Windows With SQL Server (64 bit) - English.
-
Europe:
- Amazon EC2 running Windows (32 bit) - English, German, French, Spanish, Italian.
- Amazon EC2 running Windows (64 bit) - English, German, French, Spanish, Italian.
- Amazon EC2 running Windows With SQL Server (64 bit) - English, German, French, Spanish, Italian.
If you are using Amazon DevPay in conjunction with Amazon EC2 running Windows with Authentication Services you will need to create new AMIs and adjust your pricing plan before November 1, 2009.
We continue to strive for simplicity and cost effectiveness; this is a good example of both!
-- Jeff;
PS - I know that a lot of you have been asking us to support Windows Server 2008. I don't have a release date for you yet, but I can assure you that we've prioritized the work needed to properly support it.
The Business of Cloud Computing is Booming
Trying to look beyond the hype and see if people are actually making money or getting work done should be the real litmus test in terms of gauging the business opportunity for cloud computing -- and at the end of the day it's probably a better statistic. But then again these sorts of "real world" revenue & sales pipeline stats are not nearly as easy to get. So I thought I'd take a moment, and discuss some of the recent success we've seen in our segment of the cloud world.
Generally, the Fall tends to be the hot sales season in IT where IT folks are coming back from summer vacations with budgets that must be spent before the end of the year. So this time of year does act as a kind of predictor of future sales opportunities. To put it simply, in IT if you can't sell your product or service in the Fall, you're probably not going to sell at all. This is as true in Cloud Computing as it is in any other area of information technology.
When speaking to the opportunity for cloud computing, I can only speak from my vantage point as a Cloud Service Provider enablement platform vendor. At Enomaly we specifically target service providers and hosting firms who are looking to roll out public "EC2" like infrastructure as a service. From our point of view it has become increasingly clear that any hosting firms that don't have cloud service strategies or offerings in place are quickly beginning to see huge revenue erosion. This has caused a significant influx of interest from a wide variety of hosting related companies that run the gamut from smaller VPS style resellers to multi-national telecommunication companies and everything in between.
An analysis by Guy Rosen also sheds some light on the cloud opportunity in which he estimates that Amazon Web Services (AWS) is provisioning 50,000 EC2 server instances per day. A 50K/day run rate would imply a yearly total of over 18 million provisioned instances. Based on these numbers, one could surmise, that a significant portion of these 50k in EC2 instances are directly coming out of the pockets of traditional hosting and data centers. In the hosting space, this kind of cloud leakage has become a major issue. One need not do more then monitor traffic to amazon or other cloud providers to get an idea of potential revenue walking out the door.
As a fast growing self funded company we don't have the luxury of spending large amounts on our marketing and sales efforts. For the most part we rely on word mouth and organic search engine optimization for our inbound sales channel. Because we spend a grand total of $0.00 dollars on our marketing efforts, our organic website traffic / inbound sales inquires also acts as a kind of simple market research tool. Based on the this very unscientific research tool, interest in cloud platforms is booming.
Over the last few months something interesting has happened. We've seen interest in our cloud service provider platform grow from dozens of inquires a month to dozens per day. Again, I can't say if this is a broader trend or limited to our sector, but from our vantage it has never been a better time to be in cloud computing. I'm just curious if others are seeing similar levels of interest for their cloud related products and services. I for one certainly hope so, because the better we do collectively, the better we do individually.
Announcing The Enomaly Cloud Service Provider Edition | Twitter Me | Get Linkedin | Contact Reuven | Disclosure Policy
Webinar: Securing Public Cloud Infrastructures
Mark time in your calendars for a cloud security webinar co-presented by Amazon Web Services and enStratus on Wednesday October 7, 2009 at 11:30 AM - 12:15 PM Central Time US.
Sign up todayPublic cloud computing has evolved into a mainstream approach for building out components of an IT infrastructure. Cost saving opportunities make the development of a public cloud strategy absolutely critical. Even before taking on pilot projects in the cloud, however, you should have a solid understanding of the security implications and opportunities in public cloud computing. Amazon Web Services and enStratus have teamed up for this webinar detailing how businesses moving into the cloud can understand the security issues in public cloud computing and how to secure a public cloud infrastructure.
Among the most critical components in cloud security is transparency from your cloud providers. AWS has built out an infrastructure and established processes to mitigate common vulnerabilities and offer a safe compute and storage environment. enStratus operates outside of the AWS cloud, watching over its operations, and keeping your authentication and encryption credentials safe outside the cloud while encrypting the data inside the cloud both in transit and at rest.
Steve Riley from AWS and George Reese from enStratus will discuss common cloud security concerns and show you how to take advantage of the security features AWS and enStratus provide you to build a secure public cloud infrastructure.
Key Learnings
- How does AWS protect its infrastructure and, by extension, your data?
- What can you do with tools like enStratus to further protect your data?
- How can you use enStratus to protect your data from third-party subpoenas or subpoenas targeted at AWS?
- How can I manage user access to my AWS infrastructure?
- What issues impact compliance with various standards/regulations in the AWS cloud?
Speakers
George Reese, O'Reilly cloud computing author and CTO for enStratus, a leading cloud management platform.
Steve Riley, Sr. Technical Program Manager for Amazon Web Services.
>> Steve <<
Building a Unique Data Warehouse
There are many reasons to roll your own data storage solution on top of existing technologies. We've seen stories on HighScalability about custom databases for very large sets of individual data (like Twitter) and large amounts of binary data (like Facebook pictures). However, I recently ran into a unique type of problem. I was tasked with recording and storing bandwidth information for more than 20,000 servers and their associated networking equipment. This data needed to be accessed in real-time, with less than a 5 minute delay between the data being recorded and the data showing up on customer bandwidth graphs on our customer portal.
After numerous false starts with off the shelf components and existing database clustering technology, we decided we must roll our own system. The real key to our problem (literally) was the ratio of the size of the key to the size of the actual data. Because the tracked metric was so small (a 64-bit counter) compared to the unique identifier (32-bit network component ID, 32-bit timestamp, 16-bit data type identifier) existing database technologies would choke on the key sizes.
Eventually it was decided that the best solution was to write our own wrapper for standard MySQL databases. No fancy features, no clustering, no merge tables or partitioning, no extra indexes, just hundreds of thousands of flat tables on as many physical machines as was necessary. I chronicled the whole decision making process in the full article, located here, on our developers' blog.
Bioinformatics, Genomes, EC2, and Hadoop
I think it is really interesting to see how breakthroughs and process improvements in one scientific or technical discipline can drive that discipline forward while also enabling progress in other seemingly unrelated disciplines.
The Bioinformatics field is rife with examples of this pattern. Declining hardware costs, cloud computing, the ability to do parallel processing, and algorithmic advances have driven down the cost and time of gene sequencing by multiple orders of magnitude in the space of a decade or two. Processing that was once measured by years and megabucks is now denominated by hours and dollars.
My colleague Deepak Singh pointed out a number of recent AWS-related developments in this space:
JCVI Cloud Bio-Linux
Built on top of a 64-bit Ubuntu distribution, the JCVI Cloud Bio-Linux gives scientists the ability to launch EC2 instances chock-full of the latest bioinformatics packages including BLAST (Basic Local Alignment Search Tool), glimmer (Microbial Gene-Finding System), hmmer (Biosequence Analysis Using Profile Hidden Markov Models), phylip (Phylogeny Inference Package), rasmol (Molecular Visualization) genespring (statistical analysis, data mining, and visualization tools), clustalw (general purpose multiple sequence alignment), the Celera Assembler (de novo whole-genome shotgun DNA sequence assembler), and the NIH EMBOSS utilities. The Celera Assembler can be used to assemble entire bacterial genome sequences on Amazon EC2 today!
There's a getting-started guide for the JCVI AMI. Graphical and command- line bioinformatics tools can be launched from a shell window connected to a running instance of the AMI.
CloudBurst
CloudBurst is described as a "new parallel read-mapping algorithm optimized for mapping next-generation sequence data to the human genome and other reference genomes, for use in a variety of biological analyses including SNP discovery, genotyping, and personal genomics."
In laymen's terms, CloudBurst uses Hadoop to implement a linearly scalable search tool. Once loaded with a reference genome, it maps the "short reads" (snippets of sequenced DNA approximately 30 base pairs long) to a location (or locations) on the reference genome. Think of it as a very advanced form of string matching, with support for partial matches, insertions, deletions, and subtle differences. This is a highly parallelizable operation; CloudBurst reduces operations involving millions of short reads from hours to minutes when run on a large-scale cluster of EC2 instances.
You can read more about CloudBurst in the research paper. This paper includes benchmarks of CloudBurst on EC2 along with performance and scaling information.
Crossbow
Crossbow was built to do "Whole Genome Resequencing in the Clouds." It combines Bowtie for ultra-fast short read alignment and SOAPsnp for sequence assembly and high quality SNP calling. The Crossbow home page claims that it can sequence an entire genome in an afternoon on EC2, for less than $250. Crossbow is so new that the papers and the code distribution are still a little ways off. There's a lot of good information in this poster:
Michael Shatz (the principal author of CloudBurst and Bowtie) wrote a really interesting note on Hadoop for Computational Biology. He states that "CloudBurst is just the beginning of the story, not the end." and endorses the Map/Reduce model for processing 100+GB datasets. I will echo Mike's conclusion to wrap up this somewhat long post:
In short, there is no shortage of opportunities for utilizing MapReduce/Hadoop for computational biology, so if your users are skeptical now, I just ask that they are patient for a little bit longer and reserve judgment on MapReduce/Hadoop until we can publish a few more results.I really learned a lot while putting this post together and I hope that you will learn something by reading it. If you are using EC2 in a bioinformatics context, I'd love to hear from you. Leave a comment or send me some mail.
-- Jeff;
New Public Data Set: Wikipedia XML Data
Weighing in at a whopping 500 GB (388 GB of data and 112 GB of free space to allow for some in-place decompression), the Wikipedia XML data is our newest Public Data Set.
This data set contains all of the Wikimedia wikis in the form of wikitext source and metadata embedded in XML. We'll be updating this data set every month and we'll keep the sets for the previous three months around.
As you can see from this screen shot of my PuTTY window, there are some pretty beefy files in this data set:
As an example of what can be done with this data, take a look at Cloudera's blog post on Grouping Related Trends with Hadoop and Hive. This article shows how to create a trend tracking site using a Cloudera Hadoop cluster running on EC2, using Apache Hive queries to process the data.
-- Jeff;
New Public Data Set: Daily Global Weather
The folks at Infochimps have just released the Daily Global Weather Public Data Set.
This 20 GB data set incorporates daily weather measurements (temperature, dew point, wind speed, humidity, barometric pressure, and so forth) from over 9000 weather stations around the world. The data was originally collected as part of the Global Surface Summary of the Day (GSOD) by the National Climactic Data Center and is available from 1929 to the present, with the data from 1973 to the present being the most complete.
The map at right contains one yellow dot for each data collection station.
-- Jeff;
New Public Data Set: Sloan Digital Sky Survey DR6 Subset
The Sloan Digital Sky Survey, or SDSS, is now available as a Public Data Set.
Weighing in at 180 GB, the SDSS is the most ambitious astronomical survey ever undertaken. The researchers have used a 2.5 meter, 120 megapixel telescope located in Apache Point, New Mexico to capture images of over one quarter of the sky, or about 230 million celestial objects. They have also created 3-dimensional maps containing more than 930,000 galaxies and 120,000 quasars.
This new public data set (which is a subset of the entire SDSS) will be of interest to students, educators, hobby astronomers, and researchers. From a standing start, it is possible to launch an EC2 instance, create an Elastic Block Store volume with this data, attach the volume to the instance and start examining and processing the data in less than ten minutes.
The data set takes the form of a Microsoft SQL Server MDF file. Once you have created your EBS volume and attached it to your Windows EC2 instance, you can access the data using SQL Server Enterprise Manager or SQL Server Management Studio. The SDSS makes use of stored procedures, user defined functions, and a spatial indexing library, so porting it to another database would be a fairly complex undertaking.
I know from experience (my son Andy is studying Astronomy at the University of Washington and is always showing me the "please delete your unnecessary files" emails from the department's administrator) that storage space is always at a premium in academic settings, due in part to the existence of large scale data sets like this. The combination of EC2, EBS, this public data set, and our AWS in Education program should enable students and educators to analyze, process, display, and study the universe in revolutionary ways.
-- Jeff;
Private Data Cloud: 'Do It Yourself' with Eucalyptus
Why are Enterprises implementing Private Clouds if the Public Cloud deployment model is gaining in popularity day-by-day? Guy Rosen summarizes Public Cloud growth within the user base of the Amazon Elastic Compute Cloud (EC2). Since its debut in 2006, 8.4 million EC2 instances have been launched. Impressive as these statistics are, many enterprises still consider the Public Cloud as currently a no-go area. Reasons include data security and SLA concerns, data compliance/governance regulations and the complexity of migrating legacy applications. This is where Private Clouds step-in.
Private Clouds provide many of the benefits of the Public Cloud, namely elastic scalability, faster time-to-market and reduced OpEX, all within the Enterprises own perimeter that complies to its governance. Leading commercial Private Cloud products include VMware, Univa UD, Unisys. Open source solutions include pro ducts like Globus Nimbus, Enomaly Elastic Computing Platform, RESERVOIR and Eucalyptus.
read more at: http://bigdatamatters.com/bigdatamatters/2009/09/private-cloud-eucalyptu...
Who's new
Who's online
Friends
- Puuple Rain @luxxlimo @sactweetup. http://yfrog.com/0ipmwqj
- Purple Rain @luxxlimo @sactweetup http://yfrog.com/0j3t2j
- Purple Rain @luxxlimo @sactweetup http://yfrog.com/14s88jj
- Purple Rain @luxxlimo @sactweetup http://yfrog.com/0encejj
- How To Earn Extra Money At Home With Your Own Online Business Posted By: Cynthia Minnaar


Recent comments
3 weeks 4 days ago
5 weeks 4 days ago
9 weeks 2 days ago
13 weeks 4 days ago
14 weeks 5 days ago
24 weeks 2 days ago
24 weeks 4 days ago
25 weeks 1 day ago
25 weeks 3 days ago
27 weeks 1 day ago