Amazon Provides Public Data Sets
Seattle, Washington – (Website Hosting Directory) – December 15, 2008 – Amazon.com is providing access to a centralized repository of public data sets that can be seamlessly integrated into AWS cloud-based applications.
AWS is hosting the public data sets at no charge for the community, and like all of AWS services, users pay only for the compute and storage they consume with their own applications. Data sets already available include various U.S. Census databases from the U.S. Census Bureau, 3-D chemical structures provided by Indiana University, and an annotated form of the Human Genome from Ensembl. More data sets will be available soon, including a wide range of economic statistics from the Bureau of Economic Analysis and additional scientific data sets.
Previously, large data sets such as the Human Genome and U.S. Census data required many hours to locate, download and customize. Now, anyone can access these large data sets from their Amazon Elastic Compute Cloud (Amazon EC2) instances and start computing on the data within minutes. By growing the number of people with access to important and useful data, and making it easy to compute on that data with cost-efficient services such as Amazon EC2, AWS hopes to fuel innovation and further accelerate the pace of new discoveries.
Adam Selipsky, Vice President of Product Management and Developer Relations for Amazon Web Services remarked, ”For over five years AWS has been working to lower the barriers to entry, level the playing field, and make it possible for our customers to be successful based on their ideas, not on their resources. Public Data Sets on AWS is the latest of these efforts, and we can’t wait to see the discoveries and innovations that could stem from this ecosystem.”
Select public data sets are hosted on Amazon EC2 for free as Amazon Elastic Block Store (Amazon EBS) snapshots. Amazon EC2 customers can access this data by creating their own personal Amazon EBS volumes, using the public data set snapshots as a starting point. They can then access, modify and perform computation on these volumes directly using their Amazon EC2 instances and just pay for the compute and storage resources that they use. If available, researchers can also use pre-configured Amazon Machine Images (AMIs) with tools like Inquiry by BioTeam to perform their analysis.
Dr. Peter Tonellato from the Harvard Medical School noted, ”Public Data Sets on AWS will enable me and many of my colleagues to collaborate with each other by sharing our commonly used data sets, research environments and tools. We can set up a controlled environment in minutes, run our computational analysis for a couple of hours, and shut down the environment. Our results are completely repeatable. I only pay for the compute time I use, and more importantly I can spend more time focusing on research, not downloading and setting up computational infrastructure.”
Dr. Glenn Proctor, Ensembl Software Coordinator at the EBI commented, ”Bioinformatics is a hugely exciting area which is providing much insight into our understanding of biology and, particularly, the genetic basis of many human diseases like cancer and diabetes. The genome is a complex thing, however; it presents us with a potential source of invaluable information but also with great challenges in how to store, analyze and annotate it, and how to make both the raw genomic information and our annotations available to as many people as possible. Ensembl’s approach has always been to try and lower the barriers to entry so that a researcher using a desktop PC in a lab or a laptop in an airport departure lounge has access to high-quality, up to the minute genetic information that they can use in their work. Amazon EC2 allows us to go even further and make all our data available in a robust, scalable and flexible form that anyone with an AWS account can use.”
Amazon Web Services provides Amazon’s developer customers with access to in-the-cloud infrastructure services based on Amazon’s own back-end technology platform, which developers can use to enable virtually any type of business. Examples of the services offered by Amazon Web Services are Amazon Elastic Compute Cloud (Amazon EC2), Amazon Simple Storage Service (Amazon S3), Amazon SimpleDB, Amazon Simple Queue Service (Amazon SQS), Amazon Flexible Payments Service (Amazon FPS), and Amazon Mechanical Turk.
For more information about the Public Data Sets on AWS, please visit: www.aws.amazon.com/publicdatasets.
To learn more, please visit: www.amazon.com.


