CIVX is a open source public data aggregation framework that focuses on government transparency and transitioning raw data into open and index-able formats. Our core platform is designed to simplify the process of modeling, scraping, scrubbing, correlating, and visualizing raw data.
This tour is intended to point out the types of information that CIVX aggregates as well as the nuance of the interface, please feel free to contact us with any questions.
The CIVX Landing page outlines our mission and technical requirements. From here you can access the CIVX Navigation Menu on the left side of the page. The navigation contains three top tier items; State, Federal, and Feeds.
CIVX contains an experimental CIVX app for tracking and viewing changes in the US Code. It utilizes Archive.org to find as many versions of the US Code from uscode.house.gov as possible, and commits them to a version control system. The top most grid contains the scraped file name, the chapter, and title of the section of the United States Legal Code. Selecting a file will display a list of changes to that file in the bottom grid. You can then select each row to view the actual changes to the file.
When a specific revision is selected, a "diff" is shown between the file from the last changeset committed. Red text indicates the line has been removed, and green text indicates a line that has been added.
The Federal Officials Top Tier Grid contains basic contact information for each federal official in the Sunlight Labs Legislator API dump. When an official is clicked on, each of the available profiles, usernames, and datasets that CIVX knows about will appear alongside their congressional head-shot in a subgrid. Each link opens into a new CIVX Tab, allowing multiple sites and services to be browsed simultaneously.
The New York State Officials Grid allows users to view basic information about what county each representative is from as well as their listed address information, provided by Sunlight NY.
The subgrids pictured below allow users to view the numerous projects and agencies that each representative has worked on as well as the fiscal year and funding of each project.
The metrics bar graph for the Top 10 Member Items by Official can be accessed by clicking the small metrics icon above the Member Items and State Officials Grids.
The metrics bar graph for the Top 10 Lobbyist Combined Expenses can be accessed by clicking on the small metrics icon above the Lobbyist Financials Grid.
The metrics bar graph for the Top 10 Lobbyist Combined Compensation can be accessed by clicking on the small metrics icon above the Lobbyist Financials Grid.
CIVX has aggregated a multitude of feeds from government and transparency organizations, with the intent of allowing users to gather current information quickly and easily from across the web. One of the primary goals is to enable users to keep abreast with active legislation, congressional votes, and political videos such as presidential addresses and congressional reports.
As an open source project CIVX strives to implement features that make it easier for users and developers to understand where our information comes from, how to reuse it, and how to inform us of any found inconsistencies. The view source links above many of the graphs inform developers as to how our project acquires, builds, and displays the information we get from other public sources. It is extremely important to us to allow others to build on top of our project, so we integrate the source code to encourage development.
Attributing credit to others is a key part of our mission, we do this on the attribution page, accessible through the attribution link on CIVX's bottom dock. We attempt to attribute links back to the creators of everything from our icon sets to our core technologies. On the attribution page you will notice each company or project's icon as well as a link to the technology used and the corresponding license or legal notice for each.
CIVX's Wiki allows anyone with a Fedora account to become a CIVX contributor. We plan on using the wiki for outlining development milestones, keeping track of collaboratively edited documents, and hosting our technical and how to documents.
Being advocates of open source the AGPL fits our project like a glove.
- Column filters created
- Columns defined and documented
- Model specific logic functions defined
- Database destination declared
- Data source declared
- Validation schema declared
- Model test cases implemented
- Dataset located - usually URL - Raw Data Downloaded - ex. NYSBOE/2009jan.exe
- Raw Data Revision Controlled
- Data mapped to object model
- Addresses/Districts Geocoded*
- Relationships/Foreign Keys Established
- Source Repository Declared
- Raw Data Converted to Open Formats
- Data efficiently parsed
- Raw Data Scrubbed - civx.utils.scrubber(2009jan.csv)
- Data validated and sanitized
- Invalid data identified and flagged
- Model objects committed to database
- Scrubbed dataset committed to revision control
- Messages sent to AMQP broker announcing new entries
- Models sent to Grid Controller
- Grid Controller introspects objects
- Rich interactive grids built
- Models plotted and graphed - ex: Top 10 Graphs
- Models exposed via a RESTful API
CIVX offers a Scraper API that trivializes the act of periodically downloading raw data from arbitrary web sites, in a variety of different manners. Once the data is downloaded and extracted, it is then committed to a version control system (Git), which allows us to keep track of what has been added, removed, or updated from within raw data. Using a revision control system makes it so we only need to parse new or modified data, as opposed to re-parsing the entire dataset over again. Once committed, the raw data is then scrubbed, parsed, and populated into the database, in a highly scalable manner. Upon completion, CIVX will then automatically send AMQP messages to the message broker, which allows users and other services to become aware of changes immediately, without having to continuously poll for it. CIVX also handles generating RSS/Atom feeds of the latest changes. These Scrapers are run automatically, at a given frequency, inside of the Moksha Hub.
The Scraper API provides a vast plethora of useful methods that can help developers easily do very complicated tasks, such as converting from Excel spreadsheets or HTML into raw CSV files, fetching all links from a webpage, generating RSS feeds from new data, and much more.
Checkout the development repo
git clone git://git.fedorahosted.org/git/civx; cd civx
Checkout & install Moksha
git clone git://git.fedorahosted.org/git/moksha; cd moksha
If your using a yum distribution, you'll need some packages
yum -y install python-virtualenv gcc yum-utils yum-builddep -y python-lxml pyOpenSSL python-sqlite2
If you're using Ubuntu
apt-get install curl python-dev build-essential python-virtualenv python-sqlite python-openssl python-lxml python-twisted
Setup the TurboGears2 & Moksha virtualenv and start Orbited, the WSGI stack, and the moksha-hub
Enter the Python virtualenv
source . tg2env/bin/activate
Ensure the Moksha demo dashboard works
"Install" this CIVX tree
cd .. python setup.py develop
Run the CIVX WSGI stack
paster serve development.ini
View the default dashboard
Run a scraper (defined on the [moksha.stream] entry-point in setup.py)
Initialize the app (runs things from civx/websetup.py)
paster setup-app development.ini
If you add a widget/app/scraper/etc to an entry-point, you'll need to regenerate the egg_info
python setup.py egg_info
Starting from scratch
deactivate rm -fr tg2env ./start-moksha
distutils.errors.DistutilsPlatformError: invalid Python installation: unable to open /usr/lib/python2.6/config/Makefile (No such file or directory)
You must have the python-devel package installed