Lab 15 - Databases - an introduction

In Lab 13 and Lab 14, you looked at tabular data, or data that can be contained in a single set of rows and columns. Using spreadsheets, you created formulas that related individual cells to other cells, and visualized the data you collected and created. But what happens when a more complex data structure is needed to model the relationships that exist? If we want to ask complicated questions of our data, we need to consider those collections of related tables called databases.

To consider:

Tables and queries

In this fifteenth lab, we'll look at visual representations of the differences between the tables found in spreadsheets and the related tables found in databases. We'll also review what well-formed XML looks like, and try out some database queries. As you explore, review your answers from the To consider: section above. Have you changed your mind about any of them?

Learning more

  1. Visualizing the differences - when is a table more than a table?

    • Even though we're pretty familiar with spreadsheet tables by now, let's take another look at one. The example below shows a table in a spreadsheet program. One thing to notice, and this is true of many tables in many spreadsheets, is that no calculations are present. The spreadsheet below is only used to hold tabular data - in this case, it's a website content inventory example from Donna Spencer's maadmob.com site:

      website inventory table

      Here, what you see is what you get. The familiar rows and columns are there, with each row representing a single page or subpage on the site. Even if we don't know for sure what the data in this table will be used for, we can be pretty confident about what a completed table would look like. For example, we can guess that the Last updated column would hold a date that represents the most recent update to the given web page represented by that row.

    • Database tables, when we look at them individually, look very much like the table shown above, with the same rows and columns. The difference between database tables and other tables becomes clear when we can see the relationships of those tables to each other. These relationships allow us to ask complicated questions of the data in those tables. The example below is one of the simpler collections of relationships for a database, from a Rails web development framework tutorial from BuildingWebApps:

      database tables

      This diagram is an abstracted representation of the tables, so we can see the relationships without having to sift through the data itself. Notice the relationship designators, "has_one", "belongs_to", and "has_and_belongs_to_many". If we look at the podcasts table, we can see that every podcast in podcasts has one shownote associated with it. Additionally, the categories_podcasts table in the center has been created with a join table operation, and is used to show the many-to-many (or has_and_belongs_to_many) relationships between the podcasts table and the categories table. It shows that each podcast has at least one and possibly many categories, and each category has at least one and possible many podcasts. This relationship description and modeling is what makes a relational database more than just a collection of individual tables.

    • Databases can be relatively simple, as in the example above, or extremely complex. If you'd like to see other examples, a Google image search for "database tables" will return many examples of database data models throughout that range. Barry Williams' databaseanswers.org also has many database data models available to review. The site is looking a little dated, but the resources are still useful. Let's look at Barry's data model for Accidents at Work. This particular example has a link to facts (or business rules) that are considered in this model. How are these facts reflected in the model?

      Now compare the Physical Data Model to the first model you looked at, the Conceptual Data Model. What are the differences? What additional information about the database can you find in the Dezign [sic] Data Model? What do you think the PK and FK notations stand for? How would you find out for sure?

  2. Rules for XML - a quick review

    • In order to get good results out of our XML databases, we need to put the right things into them in the right format. Not only does the data we put in need to be consistent in its format, but the XML needs to be well-formed. Using the example below (you can also download the file from badxml.txt), use the debugging strategies from earlier chapters to make corrections until it shows as well-formed. You can check your work as you go by viewing your xml in a browser (as long as the file extension is .xml). You can also check your work here, at the W3C Markup Validator: <Journal> <?xml version="1.0" encoding="UTF-8"?> <journal_entry id="1"> <body>Today is my first journal entry! I'm working on my XML skills.<body> <date_posted>Friday, June 15th, 2012</date_posted> </journal_entry> <journal_entry id='b'> <body>Enough XML, today I'm going to the beach! <pic>beachtime.jpg</pic> </body> <date_posted>Saturday, june 16th, 2012</date_posted> </journal_entry> <journal_entry id='3"> <body>Back to the XML grind...</body> <date_posted>Monday, June 18th</date_posted </journal_entry> <!-- don't forget to take this comment out when I'm done! --> </journal>
    • You may have noticed that just because the XML document is well-formed, that doesn't mean the data has been entered consistently. Take another look at this XML file, and check the following:
      • Are the date values all entered the same way?
      • Are quotes used the same way each time?
      • Are attribute values entered consistently?
      Once you have reviewed your XML file, check it against the goodxml.txt file.
    • Using the corrected XML file, create its database scheme, including identifying the primary key. Review that section of the chapter if you are having trouble.

  3. Try IT: Reviewing database query syntax
    Finally, we'll use w3schools' SQL Try It page to test out some SQL query syntax.

    • Try the canned queries on the w3schools' SQL Try It page first. Explain the results you get when you run the query: SELECT * FROM customers. Are there columns in your result that are not in the table on the main Try It page?
    • Write a query to retrieve all of the customers in Germany.
    • Run this query: SELECT CompanyName, ContactName FROM customers. If you need to get in touch with a customer right now, is this a useful result? Why or why not?
    • Is there a query that gets all of the addresses that start with a numeric value? How about all of the addresses except the ones that start with a numeric value? Hint: are numeric values "greater than" or "less than" alphabetical values?

Moving on

This brief overview has given you some tools for your database-building toolbox, which you'll put into action in the next chapter as you build your own database. If you are interested in digging a bit deeper into databases before you build your own, there are some online resources to check into. One is Stanford professor Jennifer Widom's online free class, Introduction to Databases. You can take the whole course for free by registering with your name and email address. And though there are many database applications out there, SQL (Structured Query Language) underpins many of them, and w3schools' SQL tutorial covers a lot of ground.

Because databases are more and more a part of what we do every day - from banking to blogs, from social networking to shopping - having a better understanding of how they work, where they are used, and why they are used is key to developing your fluency with information technology.