What is the right open data portal platform for our needs in Northeast Ohio? Code for America’s Field Guide thankfully reminded me that before answering the “What?” we should clarify the “Why?” We want to open data in NEO in order to foster data-driven decision making, improve the relationship between citizens and government, and develop engaging technology that will make Cleveland a more livable city.“ To be honest, an open data portal seems minor relative to the relationship-building mountain of work ahead of us. Our portal must support this growth, but it is not an end in itself.
We need something that will serve our initial efforts in this area, that we won’t need to sink much time or money on, and that can grow/adapt/extend once we make headway with our central objectives. Now we just need to figure out what that is. Here’s a list of the factors that distinguish available portals in the market.
“Turn-Key” Versus “Technical Expertise Needed”
The biggest myth that seems to be floating in the ether about portals is the idea that Socrata is “turn-key” (good to go out of the proverbial box), while other products require technical-savvy folks to implement and maintain. In reality, options for hosted and supported portals are varied.
Do you need in-house technical expertise to implement and maintain?
|Other||Open Data Catalog (by Azavea)||Socrata
SaaS (Software as a Service) and IaaS (Infrastructure as a Service) are helpful terms to know. These are categories of services where you pay someone to provide software and infrastructure instead of having to build and maintain in-house (a.k.a. “hosting and supporting” your portal, a.k.a. SaaS or SaaI service providing). As the chart above shows, in addition to Socrata and Junar, there are a few SaaS providers who will host and support CKAN and DKAN.
There’s not a ton to distinguish these portal products. They all allow you to search or browse by topic through datasets, host data in multiple formats, manage uploading and editing permissions, and support commenting. NuCivic’s DKAN and Socrata both support mapping and generate charts of individual datasets. Their products will put geospatial data on a map and will turn tables into charts after prompting the user for some very simple input. Socrata’s product allows you to rate datasets, and provides sorting options based on ratings and number of comments.
When it comes to visualizations, I am very skeptical of the mapping and charting technology that DKAN and Socrata provide. At first glance, their maps and charts are exciting and beautiful. The more I examined, though, I really struggled to understand what the added value was. These are tools that most users may use for a couple minutes, walk away from, and never use again. Real data crunchers would be looking to analyze multiple datasets simultaneously (which none of these visualizations tools can do), so they will need to export their data and analyze using independent software. Your casual visitors, on the other hand, won’t take the time to turn data into charts; they need cultivated content, charts and data knit together with a narrative, which is a service you’ll need to go beyond the portal to provide. An actual human may need to be involved. NuCivic’s DKAN, thanks to the Drupal integration (explained below), is the only product that makes other content such as stories and useful visualizations easy to incorporate — giving it an edge.
Integration with Content Management Systems (CMS)
The portal products on the market are databases, which are as sexy as you’d imagine; I was reminded over the course of my research of just how boring data is in isolation. Even the portals that offer basic visualization and mapping tools still don’t tell meaningful stories about the data, which is what a lot of our visitors are going to be looking for. To share stories, we’ll need a content management system. A CMS helps coordinate the story-telling and community-building role of the website, while the data portal (database) can be positioned as resources within that site — the same way a library website features basic articles/static pages of information and also contains a portal for database of books. There are dozens of CMS’s; Drupal (data.gov.uk) and WordPress (data.gov) are prevalent in this sphere, and it’s not unheard of to build your own.
Fortunately, almost all portals on the market can be pretty easily integrated with a CMS. The only caveat: DKAN, is a portal that “pre-integrates” Drupal and the CKAN platform. “DKAN seeks to mimic the features and API of CKAN natively within Drupal, so there’s just a single LAMP stack of software to deal with rather than an integration of two different open source platforms, as in the case of WordPress+CKAN or Drupal+CKAN” (ahoppin via stackexchange.com). I’ll have to write another post unpacking what the hell that means. For our purposes today, if you know you want to integrate with a CMS and you either already use Drupal or are neutral about CMS platform, DKAN is worth a closer look.
CKAN and DKAN are both solidly open source, which is nice on principle and because of the community it brings with it. I must admit, the functional benefits of open source are a little grey to me. Socrata has launched an open source Community Edition, but it seems pretty lightweight for our needs, and whether that means they can fully claim being an open source player is out of this article’s scope.
The more helpful way to think about open source’s functional value is to look at what we’d lose without it, which leads us to “lock-in”…
Reason would indicate that if you’re using an SaaS provider and the software they provide as a service is not freely available/open source, if you decide to end your contract, migrate to your own hosting, and start managing independently you have to leave their software behind. Is that a real risk for Socrata clients and how expensive is it (time and money) to leave?
Data lock-in and API lock-in are the two main potential risks (mcInnis per StackExchange). Based on my cursory understanding of these databases, I’m confident that data lock-in is not an issue. If you have a bunch of apples in one basket (Socrata) and you want to move them to another basket (self-hosted CKAN), you can; by nature of being open, the apples/data are yours to take and the dataset formatting prescribed by Socrata (CSV, JSON, XML) will readily transition to a new database. The potential for API lock-in is a little more concerning, but I believe the risks are not insurmountable. In the event of parting company with Socrata, Socrata claims that you could use the open source version of their server to support any important apps you’d created on Socrata’s platform. So is it possible? Yes. Will the transition be as easy as with other open source platforms that you could continue to use independently? No. If you leave Socrata to build your own system or use CKAN/DKAN, you’ll forever have to support the Socrata server to keep your old apps alive.
All this suggests a third type of lock-in that gets to the heart of the issue: risk-averse lock-in. While leaving Socrata is doable, it’s going to be more complicated to pull off, and that will be prohibitive for a lot of the risk-averse users out there. I give serious weight to one developer’s perspective (ahoppin per StackExchange), “even if you’re buying a turnkey SaaS solution, I would always go with a one that you could always change your mind and take over direct responsibility for if you ever need to. For now, CKAN and DKAN are stronger in that regard.”
Don’t want to host your product on your own servers? Don’t have servers? Cloud storage is possible with most of these products. If you use a SaaS provider, hosting will be part of their service. If you want to manage your own system, but host externally, you have a lot of options as well. As a new nonprofit with little physical infrastructure, we have to take this question seriously. We have three options: 1) use a portal SaaS provider and their hosting capabilities, 2) implement/maintain platform ourselves, but host on Cloud using and IaaS, or 3) implement/maintain platform ourselves and host on a partner organization’s server. OpenSource.com’s “OpenSaaS and the future of government IT innovation” provides some pretty detailed scenarios if you want to learn more.
In addition to the portal product, Socrata also offers two other services: “GovStat”, which supports internal performance managements, and “API Foundry,” which is an automation tool for Developers. I believe Accela (CKAN) and NuCivic (DKAN) offer features parallel to API Foundry, but I couldn’t find much information about parallel features for GovStat, and I suspect that Socrata is ahead of the others in this area. In spite of some of the lock-in concerns associated with Socrata, their vision of combining internal performance management structures and publicly accessible data portals is really compelling. GovStat an impressive product for governments/organizations who want to manage with data, and it could also be a game-changer for coalition organizers like us who want to lead with clear and measurable goals.
Working with a Software as a Service provider significantly reduces liability. Potential risks and control considerations include: ensuring upload and editing privileges are controlled, ensuring website security, dealing with expensive development complications, time and expenses associated with hiring and maintaining appropriately skilled technologists. The SaaS provider (Socrata, Accela, etc.) will absorb almost all those risks. This here is one of the most compelling reasons to go SaaS. If we decide to host/develop independently, we should anticipate and plan for these (and other) associated risks.
Aside from soliciting bids from various SaaS providers, the most reliable approach I’ve found to estimate pricing is to find proposals from other cities who have contracted with prospective providers. The building phase will be the most expensive. Assuming smooth sailing, the cost scale based on my research indicates that Socrata is most expensive, the other SaaS providers are in the middle, and developing an open source platform in-house is cheapest (provided you’re well connected with a tech-savvy community — for ex, I’ll be looking to Cleveland’s Code for America brigade — who can help you develop an RFP). However, since smooth sailing is more the exception than the rule, if we assume Murphy’s law in-house development costs could easily eclipse those of the SaaS providers. Once built, maintenance is much more affordable with in-house support, but we’d still need to be prepared with extra tech capacity if issues arise. I’d hate to have our site crash and not be able to get in touch with the grad student we’d contracted to maintaining our site; if we choose to maintain in-house, we’d need to take it seriously and be ready for disaster.
This is a relatively minor point, but worth noting. CKAN/DKAN products have a slightly different default list layout. The left sidebar menu and browse list view on Chicago’s Data Portal homepage are typical of most of the Socrata setups I’ve encountered. I especially like how much content is listed above the fold; it’s tight and functional. For a casual user, however, it may feel dense and stuffy. The CKAN list view has a little more breathing room, but I suspect would become tedious to use if you’re a seasoned researcher browsing for specific content.
Parting Thoughts: Engagement
If we were going this alone, I’d be leaning toward DKAN at this point (either through NuCivic or self-developed/hosted) because I care a lot about meaningful stories and visualizations. Meaningful, narrative content that goes beyond datasets in a portal is so central to our work that our minimally viable product will require smooth CMS integration. The GovStat features of Socrata are really exciting, but our coalition is easily 18 months away from being able to define coalition metrics that might lend themselves to integration with the portal; IMHO the potential is too fuzzy at this point to justify a commitment to a Socrata product.
Since we’re not going this alone, though, I’m hoping to use this framework to engage our current partners — including Code for America, One Community, NEO CANDO, and the Urban Health Initiative — to help make a collaborative decision about what platforms best fit our needs. And, since providing meaningful opportunities for citizens to participate is also a priority, we may use this decision as a chance to consult and incorporate the needs of real people. Stay tuned.
“CKAN versus Socrata Discussion on Stack Exchange” (opendata.stackexchange): http://opendata.stackexchange.com/questions/1517/ckan-vs-socrata
“What’s the difference between Socrata and CKAN in open data platform?” (Quora): http://www.quora.com/Whats-the-difference-between-Socrata-and-CKAN-in-open-data-platform
“OpenSaaS and the future of government IT innovation” (OpenSource.com): http://opensource.com/government/14/1/opensaas-and-government-innovation
“Open Data FAQ” (Sunlight Foundation): https://sunlightfoundation.com/policy/opendatafaq#portal
“Open by Default [beta]” (Code for America): http://www.codeforamerica.org/governments/principles/open-data/