CIVX Tour

CIVX is a open source public data aggregation framework that focuses on government transparency and transitioning raw data into open and index-able formats. Our core platform is designed to simplify the process of modeling, scraping, scrubbing, correlating, and visualizing raw data.

This tour is intended to point out the types of information that CIVX aggregates as well as the nuance of the interface, please feel free to contact us with any questions.

Front Page

The CIVX Landing page outlines our mission and technical requirements. From here you can access the CIVX Navigation Menu on the left side of the page. The navigation contains three top tier items; State, Federal, and Feeds.

  • State contains information related to state specific datasets, primarily focusing on NYS which was our initial case study.
  • Federal contains three items:
    • US Code
    • Federal Officials
    • Stimulus - StimulusWatch.org Grid
  • Feeds contain syndicated website updates from governmental entities, transparency organizations, and other government related projects

https://fedorahosted.org/releases/c/i/civx/tour/civx-frontpage.png



US Code

CIVX contains an experimental CIVX app for tracking and viewing changes in the US Code. It utilizes Archive.org to find as many versions of the US Code from uscode.house.gov as possible, and commits them to a version control system. The top most grid contains the scraped file name, the chapter, and title of the section of the United States Legal Code. Selecting a file will display a list of changes to that file in the bottom grid. You can then select each row to view the actual changes to the file.

When a specific revision is selected, a "diff" is shown between the file from the last changeset committed. Red text indicates the line has been removed, and green text indicates a line that has been added.

https://fedorahosted.org/releases/c/i/civx/tour/civx-uscode1.png

https://fedorahosted.org/releases/c/i/civx/tour/civx-federal-uscode-diff.png


Federal Officials

The Federal Officials Top Tier Grid contains basic contact information for each federal official in the Sunlight Labs Legislator API dump. When an official is clicked on, each of the available profiles, usernames, and datasets that CIVX knows about will appear alongside their congressional head-shot in a subgrid. Each link opens into a new CIVX Tab, allowing multiple sites and services to be browsed simultaneously.

https://fedorahosted.org/releases/c/i/civx/tour/civx-federal-officials.png


https://fedorahosted.org/releases/c/i/civx/tour/civx-govtrack.png

Stimulus data

https://fedorahosted.org/releases/c/i/civx/tour/civx-federal-stimulus.png


New York

The New York State Officials Grid allows users to view basic information about what county each representative is from as well as their listed address information, provided by Sunlight NY.

https://fedorahosted.org/releases/c/i/civx/tour/civx-officials-grid.png

The subgrids pictured below allow users to view the numerous projects and agencies that each representative has worked on as well as the fiscal year and funding of each project.

https://fedorahosted.org/releases/c/i/civx/tour/civx-officials-member-items.png

The metrics bar graph for the Top 10 Member Items by Official can be accessed by clicking the small metrics icon above the Member Items and State Officials Grids.

https://fedorahosted.org/releases/c/i/civx/tour/ny-top10-itemsbyofficial.png

The metrics bar graph for the Top 10 Lobbyist Combined Expenses can be accessed by clicking on the small metrics icon above the Lobbyist Financials Grid.

https://fedorahosted.org/releases/c/i/civx/tour/ny-top10-expenses.png

The metrics bar graph for the Top 10 Lobbyist Combined Compensation can be accessed by clicking on the small metrics icon above the Lobbyist Financials Grid.

https://fedorahosted.org/releases/c/i/civx/tour/ny-top10-compensation.png

Feeds Library

CIVX has aggregated a multitude of feeds from government and transparency organizations, with the intent of allowing users to gather current information quickly and easily from across the web. One of the primary goals is to enable users to keep abreast with active legislation, congressional votes, and political videos such as presidential addresses and congressional reports.

https://fedorahosted.org/releases/c/i/civx/tour/civx-feeds-activelegislation-open.png

https://fedorahosted.org/releases/c/i/civx/tour/civx-feeds-housefloor.png

https://fedorahosted.org/releases/c/i/civx/tour/civx-feeds-obama-uploads.png

https://fedorahosted.org/releases/c/i/civx/tour/civx-feeds-youtube.png

Project specific

As an open source project CIVX strives to implement features that make it easier for users and developers to understand where our information comes from, how to reuse it, and how to inform us of any found inconsistencies. The view source links above many of the graphs inform developers as to how our project acquires, builds, and displays the information we get from other public sources. It is extremely important to us to allow others to build on top of our project, so we integrate the source code to encourage development.

https://fedorahosted.org/releases/c/i/civx/tour/civx-viewsource.png

https://fedorahosted.org/releases/c/i/civx/tour/civx-widget-source.png

Attributing credit to others is a key part of our mission, we do this on the attribution page, accessible through the attribution link on CIVX's bottom dock. We attempt to attribute links back to the creators of everything from our icon sets to our core technologies. On the attribution page you will notice each company or project's icon as well as a link to the technology used and the corresponding license or legal notice for each.

https://fedorahosted.org/releases/c/i/civx/tour/civx-attribution.png

CIVX's Wiki allows anyone with a Fedora account to become a CIVX contributor. We plan on using the wiki for outlining development milestones, keeping track of collaboratively edited documents, and hosting our technical and how to documents.

https://fedorahosted.org/releases/c/i/civx/tour/civx-wiki.png

Being advocates of open source the AGPL fits our project like a glove.

https://fedorahosted.org/releases/c/i/civx/tour/civx-agpl.png


Dataset Life Cycle

Modeled

  • Column filters created
  • Columns defined and documented
  • Model specific logic functions defined
  • Database destination declared
  • Data source declared
  • Validation schema declared
  • Model test cases implemented


Scraped

  • Dataset located - usually URL - Raw Data Downloaded - ex. NYSBOE/2009jan.exe
  • Raw Data Revision Controlled

Parsed

  • Data mapped to object model
  • Addresses/Districts Geocoded*
  • Relationships/Foreign Keys Established

Scrubbed

  • Source Repository Declared
  • Raw Data Converted to Open Formats
  • Data efficiently parsed
  • Raw Data Scrubbed - civx.utils.scrubber(2009jan.csv)
  • Data validated and sanitized
  • Invalid data identified and flagged

Populated

  • Model objects committed to database
  • Scrubbed dataset committed to revision control
  • Messages sent to AMQP broker announcing new entries

Correlated

  • Data is dynamically associated and related to existing data

Visualized

  • Models sent to Grid Controller
  • Grid Controller introspects objects
  • Rich interactive grids built
  • Models plotted and graphed - ex: Top 10 Graphs
  • Models exposed via a RESTful API

Scraper API

CIVX offers a Scraper API that trivializes the act of periodically downloading raw data from arbitrary web sites, in a variety of different manners. Once the data is downloaded and extracted, it is then committed to a version control system (Git), which allows us to keep track of what has been added, removed, or updated from within raw data. Using a revision control system makes it so we only need to parse new or modified data, as opposed to re-parsing the entire dataset over again. Once committed, the raw data is then scrubbed, parsed, and populated into the database, in a highly scalable manner. Upon completion, CIVX will then automatically send AMQP messages to the message broker, which allows users and other services to become aware of changes immediately, without having to continuously poll for it. CIVX also handles generating RSS/Atom feeds of the latest changes. These Scrapers are run automatically, at a given frequency, inside of the Moksha Hub.

The Scraper API provides a vast plethora of useful methods that can help developers easily do very complicated tasks, such as converting from Excel spreadsheets or HTML into raw CSV files, fetching all links from a webpage, generating RSS feeds from new data, and much more.

Development Guide

Checkout the development repo

git clone git://git.fedorahosted.org/git/civx; cd civx

Checkout & install Moksha

git clone git://git.fedorahosted.org/git/moksha; cd moksha

If your using a yum distribution, you'll need some packages

yum -y install python-virtualenv gcc yum-utils
yum-builddep -y python-lxml pyOpenSSL python-sqlite2

If you're using Ubuntu

apt-get install curl python-dev build-essential python-virtualenv python-sqlite python-openssl python-lxml python-twisted

Setup the TurboGears2 & Moksha virtualenv and start Orbited, the WSGI stack, and the moksha-hub

./start-moksha

Enter the Python virtualenv

source . tg2env/bin/activate

Ensure the Moksha demo dashboard works

Stop Moksha

./stop-moksha

"Install" this CIVX tree

cd ..
python setup.py develop

Run the CIVX WSGI stack

paster serve development.ini

View the default dashboard

Run a scraper (defined on the [moksha.stream] entry-point in setup.py)

moksha-hub

Initialize the app (runs things from civx/websetup.py)

paster setup-app development.ini

If you add a widget/app/scraper/etc to an entry-point, you'll need to regenerate the egg_info

python setup.py egg_info

Starting from scratch

deactivate
rm -fr tg2env
./start-moksha

Problems

distutils.errors.DistutilsPlatformError: invalid Python installation: unable to open /usr/lib/python2.6/config/Makefile (No such file or directory)

You must have the python-devel package installed

External links