free hit counters

I love a good database analogy

by Sue Raisty-Egami on March 27, 2012

in Big Data

I love it when a gnarly technical concept can be elegantly explained via a good analogy. In fact, I have found myself searching for these perfect analogies throughout my career.

I’ve been working on NoSQL projects of late, and found the following two articles – both with brilliant analogies — to be very helpful:

  1. Starbucks Does Not Use Two-Phase Commit” – This article is not about NoSQL databases per se, but it does illustrate the point that sometimes you can afford to lose a transaction or have your database in an inconsistent state temporarily.
  2. A Plain English Introduction to the CAP Theorem” – this is a brilliant explanation of the trade-offs you need to make when scaling a database:  consistency, availability, and partition tolerance.  If you’ve ever had a hard time wrapping your head around these concepts, this article will help.

{ 0 comments }

Economics are a big deal when deciding whether to go with a NoSQL, NewSQL, MPP DBMS, or traditional RDBMS.

Many of the products  (in all four of these categories) claim that economics are on their side, because either…

  1. Their products are ultimately cheaper than the alternatives, or
  2. Their products produce greater benefits, that dramatically overshadow their cost.

The first is basically a Total Cost of Ownership (“TCO”) argument.  The second is about  higher ROI (return on investment) or shorter payback periods, versus the alternatives.

The TCO, ROI, and payback period for any particular data store (be it of the NoSQL, NewSQL, MPP, or RDBMS variety) will vary greatly according to the application.  It is important not to take vendor claims of  ”X% improvement” for granted.

Instead, create your own economic model for quantifying expected costs and benefits. Here are some factors to consider:

1.Hardware costs – Many vendors claim their products require cheaper hardware, either by using fewer machines or by using cheaper machines.  Hardware costs affects both up-front project costs (CapEx) as well as yearly maintenance fees (operational costs) paid to the hardware vendors.  Hardware costs can usually be further subdivided into processors, memory, disk storage, network interfaces, racks,  load balancers, dedicated storage appliances, etc.  The cost of memory can be a significant driver of costs, as well as the type of storage used (SSDs, SANs, RAID arrays of regular old hard disks, ….)

2. Software license costs & maintenance – The cost here can vary wildly, from $0 for one of the many open source NoSQL solutions, to millions of dollars.  This item includes up-front CapEx costs – as well as the annual maintenance bill, typically about 20% of the purchase price.

3. Software support costs – If you buy a commercial data store, customer support is probably included in your annual software maintenance bill (see the previous bullet). But if you are looking at open source, you will want to find a vendor that can provide ongoing software support.  And that costs, even if the software itself is free.  This is an ongoing operational expense.

4. Power costs – If you use fewer boxes, or use more energy efficient hardware, then your power bill will be lower — sometimes significantly so.  If you want to get fancy and impress others with your “green-ness”, consider calculating your carbon footprint, and take into account any Carbon Offsets you’d purchase to reduce the system’s environmental impact.

5. Administrative costs – This bucket includes the cost of the staff needed to keep your system running and healthy — usually the full-loaded salaries of the DBAs and other administrators.  In general, the more hardware you are running, and the more instances of software you are running, the more administrative staff you need.  However, different products have different demands of administrators.  For example, “sharding” (sensibly dividing up data across multiple nodes) can consume a lot of administrator time if  done manually, but many NoSQL and NewSQL systems do this automatically.  Recovering from downtime – both planned and unplanned – can consume large amounts of administrative time, so the availability of the system impacts these costs.  Other tasks that can take up a lot of time, but vary considerably from system to system, include: time spent on upgrades (especially if updates to the software come out very frequently),  time spent on performance tuning, and time spent on monitoring, backups, etc.  Different products, and different rules governing data and IT, require varying levels of attention from human administrators.

6. Developer productivity - If the structure and type of data in the datastore changes, often the applications that use the data store need to be changed as well. So, you should account for the fully loaded salaries (or consulting fees) for the programmer time needed to make these changes.

7. “Hard” revenue impact - Different configurations of products will achieve different levels of availability.  This ultimately translates into some amount of system downtime or time when system is overloaded.  If your system is essential to your company actually making money (for example, it is an e-commerce store), than every minute of downtime results in lost revenue.  If the system is still “up” but is overloaded, then you lose the ability to serve some customers who would have bought.  Again, the result is lost revenue.

8. “Soft” revenue impact – This includes things like “greater customer satisfaction”, “higher accuracy”, “better response times”  – things your choice of system might affect that will impact customers, and will thereby affect the amount of revenue they give your company.  ”Soft” costs are very real, but are often difficult to quantify.

Note that the above factors assume that you will own your infrastructure – the hardware, software, etc.  ’The model needs tweaking / revamping if you are “renting” infrastructure in the cloud, by using something like Amazon Web Services, for example.  (In the future, perhaps I’ll post about the economics of cloud deployments).

{ 0 comments }

The Best Books on Product Management

January 8, 2012

On Quora, someone asked “What are some must-read books for product managers“?  My answer is reproduced below, and is currently the leading answer on the topic, having been up-voted 10 times. —— For product managers working on high tech products and early technologies, the following are absolute must-reads: Crossing the Chasm, by Geoffrey Moore Inside the Tornado, by [...]

Read the full article →

Hadoop, Traditional Data Warehouses, and ETL

November 21, 2011

Hadoop is just starting to come into mainstream consciousness.  As a result, a lot of people are grappling with understanding the relationship between Hadoop and traditional data warehouses, and how ETL (Extract, Transform, Load)  fits into the picture.  On one of the “Big Data” forums on LinkedIn, someone asked the below question. Scroll down for [...]

Read the full article →

Best Blogs on Big Data

February 13, 2011

In the past year, Sure Product Consulting has done a lot of work in the emerging “Big Data” sector. Our clients have included newborn startups building their business models on top of NoSQL technologies, as well as traditional business intelligence and database vendors. In the process, we’ve had to come up to speed quickly on [...]

Read the full article →

To Appliance or Not To Appliance

January 31, 2011

I’ve had a lot of conversations lately with product managers who are wrestling with the appliance question:  Should they create a software-only product, or should they deliver an integrated hardware-plus-software appliance that contains all the underlying hardware and software needed to run the product? It’s an important question because: It has broad ramifications for the [...]

Read the full article →

“In Memory Analytics” – Another Hype Marketing Example

September 20, 2010

Big Data is getting really hot these days, and along with it, some of its pet terms.  One of those terms is “In Memory Analytics.” All of a sudden EVERYBODY — be they providers of data warehouse appliances, OLAP tools, data visualization tools, or other business intelligence derivatives — are claiming to have “In Memory [...]

Read the full article →

Forbes.com – Talking about software for small businesses

June 18, 2010

Forbes.com recently interviewed me about the software and online services I use to run Sure Product Consulting. They were interested in my perspective as a self-interested small business owner and as an expert in evaluating, defining and launching software products with real business value. Check it out: “A Software Maven Picks Her Tools,” by David [...]

Read the full article →

“Freemium” Business Models – How to Decide What’s Free and What’s Not

April 22, 2010

A colleague asked me this the other day: If a company is pursuing a “freemium” business model, how should they determine the optimal mix of features to offer in free vs. paid software? It’s a good question. Freemium – where a product is made available in both a free and paid-for version – is a [...]

Read the full article →

Why You Need a Product Strategy. Now.

April 16, 2010

I’ve said it before, and I’ll say it again.  All product managers should develop well-researched and well-supported product strategies, even if your boss is not requiring it.  It’s difficult to make the time, especially with the dozens of activities that product managers are usually simultaneously juggling, but it must be done. Why? Otherwise, you will [...]

Read the full article →