Static and Dynamic Data and Web Pages
FREE     Duration: 19:08
 

Takeaways:

  • TCP/IP is the networking model and set of communication protocols used for the Internet and similar networks
  • TCP/IP provides end-to-end connectivity specifying how data should be formatted, addressed, transmitted, routed and received at a destination
  • TCP/IP has four layers of abstractions which are used to sort various protocols based on the scope of networking involved: Link Layer, Internet Layer, Transport Layer, and Application layer
  • HTTP is a technology used to communicated between web clients and web servers and is part of the bigger TCP/IP protocol
  • The six main types of data you will deal with when working with D3 are Text, JSON, XML, HTML, CSV, and TSV
  • JavaScript and the XMLHttpRequest objects (XHR for short) provide a method for exchanging data asynchronously between browser and server to avoid full page reloads
  • D3 has Type Specific XHR functionality to load in data into the browser from a server: d3.text, d3.json, d3.xml, d3.html, d3.csv, and d3.tsv
  • Each Type Specific XHR D3.js request has certain features and methods for parsing the data that come in
  • The type specific XHR request invokes the callback function with two arguments - the data and an error, if it exists, for why the request failed

Transcript:

Static and Dynamic Data and Web Pages

The Goal


This code makes a d3.json XHR type specific AJAX call to get and then use data from a server using D3.

// CAN, MEX & USA Feature Collection Object AJAX Call
d3.json('CAN_MEX_USA.geo.json', function(error, data) { 

    // Use the D3 pattern to draw the 3 paths
    svgContainer.selectAll("path")
        .data(data.features)
      .enter().append("path")
        .attr("d",  function(d, i) { return geoPath(d); });
});

In this video, we are going to cover how this works, what it is doing and what to pay attention to when using D3.

We will also cover how to think about, load and use static and dynamic data sets in static and dynamic web pages.



HTTP Request/Response Basics


The acronym HTTP stands for Hyper Text Transfer Protocol.

// HTTP
// => Hyper Text Transfer Protocol

HTTP is a technology used to communicated between web clients and web servers.

It is part of a bigger protocol called TCP/IP.


TCP/IP is the networking model and set of communication protocols used for the Internet and similar networks.

// TCP/IP

// TCP
// => Transmission Control Protocol

// IP
// => Internet Protocol

TCP/IP provides end-to-end connectivity specifying how data should be formatted, addressed, transmitted, routed and received at a destination.


TCP/IP has four layers

  • Link Layer (single network segment)
  • Internet Layer (independent network connection)
  • Transport Layer (host-to-host communication)
  • Application layer (protocols for specific data communications servers on process-to-process level)

TCP/IP has four layers of abstractions which are used to sort various protocols based on the scope of networking involved.

The lowest level is the Link layer and the highest level is the application layer.

The link layer describes protocols used to describe local network topology.

The internet layer functions to route transport data grams to the next IP router that has the connectivity to a network closer to the final destination.

The transport layer deals with opening and maintaining connections between internet hosts.

The application layer is where applications create user data and communicate this data to other hosts.

The application layer is where FTP, SSH, HTTP, HTTPS and others operate.


Application Layer
=> lower levels are "black boxes"

For the most part, the application layer treats the lower levels as black boxes.

That is, outside of a few special cases, it interfaces with the lower levels through APIs and is not concerned with how they work.

The way we deal with the application layer is through the web browser or client.


Web Browser / Client

The web browser / client primarily knows the HTTP language.

That means that a web browser sends and/or receives data by using the HTTP protocol.

Most web browsers know a few other protocols like FTP and HTTPS.


Web Browser / Client Main Functions

  • Send information
  • Retrieve information requested
  • Render information retrieved
  • Access other information

The main functions of a web browser should come as no surprise to you.

It gets and interprets information and when you request new information it should then start the process again of the retrieving of information.

If and when you decide to send information, it should enable you to do that as well.


HTTP Request / Response Life-Cycle

  • URL of website
  • Request routed to web server via TCP/IP layers
  • Web Server receives HTTP request, processes it and responds
  • Response routed to client via TCP/IP layers
  • Request rendered

Which leads us to what the HTTP Request/Response Life-Cycle looks like once it has left the web browser.

Every time you visit a webpage, request a URL and/or click on a link, behind the scenes a request to a web server is getting made.

And then, in turn, a response is being received from a web server.


This is a simple HTTP Transaction.

[ Image: Simple HTTP Transaction ]
source: http://blog.catchpoint.com/2010/09/17/anatomyhttp/

There are 6 steps to a simple HTTP transaction.

One - the DNS lookup

Two - Connecting to the correct server

Three - Sending an HTTP request to the server

Four - Waiting for the server and receiving the response

Five - Client loading the content of the response

Six - Closing the connection to the web server


Simple HTTP Transaction

  1. DNS Lookup
  2. Connect
  3. Send
  4. Wait & Receive
  5. Load
  6. Close

The DNS lookup step resolves the IP address of the server given the URL used.

The Connect step establishes a TCP connection with the IP address of the server.

The Send step sends the HTTP request to the web server.

The Wait and Receive step waits for the server to respond to the request.

The Load step loads the content that was received.

The Close step closes the TCP connection.

So the full life-cycle of each HTTP Request / Response is these six steps.


Simple HTTP Transaction

  1. DNS Lookup
  2. Connect
  3. Send
  4. Wait & Receive
  5. Load
  6. Close

For our work with D3, the send, wait and load steps are the most important.

They are important because this is how we get and use data from a server with D3.


Send Step

=> HTTP Request

  • HTTP Method tells server what we want

=> HTTP Methods

  • "OPTIONS"
  • "GET"
  • "HEAD"
  • "POST"
  • "PUT"
  • "DELETE"
  • "TRACE"
  • "CONNECT"
  • "extension-method"

The SEND step is very important because it tells the server what we want.

This is done through the HTTP Methods of the HTTP Request.

You may have seen these HTTP methods before, especially if you are familiar with RESTful application program interfaces.

The GET request retrieves data from a web server by specifying parameters in the URL portion of the request.

The POST request uses a message body to send data to a web server.

The PUT request is like the POST request except that you are putting the data to an exact location.

The DELETE request is used to delete a resource from the server.

These are the main ones that we run into when programming interfaces to the web server to get, put, post and select data from the program.


Wait and Receive Step
=> Server gets the right resources and sends it back.

The Wait and Receive step is very important because it is where the server gets the data requested and sends it back to us.

This can either be a file of some kind or it can be a resource that exists in a database.

How this works and the best practices for this step are outside of the scope of this video.

What is important to discuss however, is that barring some unforeseen circumstances, we will get back the data requested.


Load Step
=> Load the content of the response
=> Read HTTP Response Code
=> Read Data Sent Back

The Load step is very important for two reasons.

One - the web server sends back an HTTP Response code to give us a message about what happened with the request we sent.

Two - if the request was successful, we receive and load the content of the response.


HTTP response Codes

  • 1xx - Informational
  • 2xx - Success
  • 3xx - Redirection
  • 4xx - Client Error
  • 5xx - Server Error

The HTTP response codes the server sends back fall within these five categories.

The 100 level responses are information indicating that the request was received and the process is underway.

The 200 level responses are information indicating that the request was received, understood, accepted and processed successfully.

The 300 level responses are information regarding how the client must take additional action to complete the request.

The 400 level responses are information regarding how the client made an error in the request that was sent to the server.

The 500 level responses are information regarding how the server failed to fulfill an apparently valid request.


HTTP Response Codes worth knowing

  • 200 - OK, successful HTTP Request
  • 202 - Request accepted, processing not completed
  • 204 - Request good, not content returned.
  • 301 - Requested content has been permanently moved
  • 400 - Bad request
  • 401 - Authentication failed, not authorized
  • 403 - Valid request, server refuses to respond
  • 404 - Content not found
  • 500 - Server encountered an error
  • 503 - Server service unavailable

These are common responses worth keeping in the back of your mind.

As you program or use the internet, they are the most commonly encountered server responses that you shall see.

So going back to the Load step, the most common response here will be the 200 one telling us that the HTTP request was successful.

This means that the data sent back will be exactly what we requested and it is ready for us to use.

What we then do with this data is up to us.

For the most part, since these videos have been about doing data visualization with D3, it means we will be visualizing the data that was returned.

Next, let's talk about data in static versus dynamic web pages.



Thinking About Data In Static And Dynamic Web Pages


Data Types

  • Text - text/plain
  • JSON - application/json
  • XML - application/xml
  • HTML - text/html
  • CSV - text/csv
  • TSV - text/tsv

These are the six main types of data mime types that we deal with when working with D3.

They should be self-explanatory.

The CSV stands for comma separated values.

The TSV stands for tab separated values.


Static Data vs Dynamic Data

  • Generated Once vs Generated Multiple Times
  • Same when visited more than once?

When we think about whether data is static or dynamic, we can think about it in two different ways.

The normal way is to think about whether the data is generated once or whether it is generated more than once.

If it is generated once, then it is static data.

If it is generated multiple times, then it is dynamic data.

We can also think about it in terms of web URL / URI terms.

If we visit a URL that points to and returns a specific data set, we would say the data is static if regardless of when we visit the URL, the data set returned is the same.

If we visit a URL that points to and returns a specific data set, we would say the data is dynamic if regardless of when we visit the URL the data set returned changes.

Why is it worth thinking about it the second way?

It is worth thinking about it the second way, the URL way, because we can get static data sets of dynamic data generation processes.

For instance, we can argue that weather is dynamic and not static.

Yet, we can get a static data set of the temperature and humidity of several locations from January 1st 2013.

So from now on, when we talk about static or dynamic data, we will be talking about data in the sense of whether the web address of a data set returns the same or different data set from the last request to that specific address.


Data Types + Static / Dynamic Data

  • Static / Dynamic Text
  • Static / Dynamic JSON
  • Static / Dynamic XML
  • Static / Dynamic HTML
  • Static / Dynamic CSV
  • Static / Dynamic TSV

Given that we can have different data type files and we can have either static or dynamic data, we can have different combinations of data types and files.

Whether they are coming from a data base or being written to a file system that is able to be reached by URL, we can have all the different types of files.

Knowing what type of data we are dealing with now and what type of data we are going to be dealing with in the future is a big part of visualizing the data.

In addition to what type of data visualization we make, it also dictates how we load the data and what types of web pages we create.


Static vs Dynamic Web Page

The same way we can think of static and dynamic data, we can think of web pages in the same way.

If every time we load a web page it has the exact same content, then we can think of it as a static web page.

If at least some of the time that we load a web page it has different content then we can think of it as a dynamic web page.

If we then think about what type of data drives a web page to be static or dynamic, we can come to a pretty clear conclusion.


static data => static web page
dynamic data => dynamic web page

A static data set loaded into a web page would lead to a static web page.

The data does not change so the web page visualizing the data set does not change.

A dynamic data set loaded into a web page would lead to a dynamic web page.


Loading static data
- once and done

If we then think about how we load static data to the web page, we realize that we only have to do it once.

Since the data set will be the same regardless of when we access the resource through a URL, we only have to load it once and then we are done.

We can then visualize it or render it however we want.

Even though the data set is static, we have several things to think about...

We have to think about the size of the data set.

We have to think about what part of the static data set we will be using.

We also have to think about how the data set is being loaded into the client.


Loading dynamic data
- Rate of change matters
- Size of data matters

If we then think about how we load dynamic data to the web page, things become more complicated.

This is because we now have to think more carefully about how we treat the data that we receive from the server.

First, we have to figure out how frequently the dynamic data is changing.

Then we have to figure out how frequently we want to show this change on the web page.

Then we have to think about whether the data set is changing in size.

Then we have to think about how long the data is valid.

Then we have to think about whether it makes sense to cache the data in the web browser.

and just like with the static data set, we also have to think about The size of the data set.

What part of the dynamic data set we will be using.

As well as how the data set is being loaded or will be loaded into the client.

All of these questions lead us to thinking about how to load and use the right data for each type of page and data set.



Loading And Using Data In Static And Dynamic Web Pages


Loading Data Questions
- Data Size?

When we start thinking about our data and how we are going to use it, the most important question we need to ask is how big is the data set.

The rest of the questions we asked at the end of the previous section will be affected by this one question.

Depending on the answers to the size question, the way the data is loaded and used will cause the code we write on the server and on the client to be vastly different.

Sometimes more simple and other times much more complicated.


Loading Data Questions
- Data Size?
- How is data loaded?
- What is the frequency of change?

The size of the data set we are serving matters for four reasons.

Three which concern us and one which concerns our audience.

The three reasons which concern us are hosting, bandwidth and server related.

The bigger the data set we are hosting, the higher our costs to have the data readily available for serving.

Though the prices are minute, if the data set is accessed very frequently, the costs can add up very quickly.

The bigger the data set we are hosting, the higher our bandwidth costs to have the data being server from our server.

The bigger the data set we are transmitting, the more work the server will have to do in order to keep the connection open and to serve the data.

Which are important things to think about.

However, the most important thing to think about is our audience.

The bigger the file size the longer it will take to load in their web browser.

The slower the website the more likely our audience will leave.

So it is crucially important to think about how to minimize the wait time for the web browser to load the data.

If the data size is very small, then it makes sense to load it as part of the web page being loaded.

If the data size is bigger than very small, then it makes sense to load it after the rest of the web page elements have been loaded and constructed.

If the data size is big or larger, then we have to start thinking about splitting the data into separate parts that we load either sequentially or as needed.


Loading Data Questions
- Data Size?
- How is data loaded?
- What is the frequency of change?
    => Caching?
    => Refresh Rate?
    => Stale Data?

The other question that comes up with dynamic data is how frequently the data set is changing.

Within this question, we have to start thinking about whether we are going to cache the data.

How often the data is being refreshed on the server side and how to implement it.

How often we need / want the data to be refreshed on the client side and how to implement that.

We also have to think about how to implement code that makes sure that stale data generates an alert so the audience on the client knows that they are looking at stale / old data.

Just like the data loading questions, depending on the frequency of change, these questions will lead to very different strategies on both the server side as well as the client side.


One commonality that all of the different questions contain is that they all have to do with requesting, receiving and processing data asynchronously.

[ Image: AJAX Web Application Model ]

And to do asynchronous requests, we use the AJAX Technology.

Because JavaSCript is single-threaded, having the ability to do AJAX calls makes it less susceptible to a slow down in the program.

With AJAX, data is retrieved using XMLHttpRequest objects.

JavaScript and the XMLHttpRequest objects provide a method for exchanging data asynchronously between browser and server to avoid full page reloads.


XMLHttpRequest (xhr)

  • GET
  • POST
  • HEAD
  • PUT
  • DELETE
  • OPTIONS

XMLHttpRequest or XHR for short, is an API available in web browser scripting languages like JavaScript.

It is used to send HTTP and/or HTTPS requests directly to a web server and load the server response directly back into the script.

The data received from the server can be JSON, XML, HTML and/or plain text.

The methods that it uses are the same as HTTP.

The basics of the XHR object is that once created, the Object will go through a series of states of opening a connecting, sending information, ready state when the HTTP content begins loading and the ready state for when the HTTP content has finished loading.

Only after the content has finished loading, will the asynchronous JavaScript code proceed.


D3 has Type Specific XHR functionality to load in data into the browser from a server.

d3.xhr -->

d3.text(...)

d3.json(...)

d3.xml(...)

d3.html(...)

d3.csv(...)

d3.tsv(...)

This functionality covers

Text data

JSON data

XML data

HTML data

Comma Separated Values Data also called CSV

and Tab Separated Values Data also called TSV

Each type of request has certain features and methods for parsing the data that come in.


Each type specific XHR request creates a request for the specific mimeType file at the specified URL.

d3.text(url [, mimeType] [, callback])

d3.json(url [, callback])

d3.xml(url [, mimeType] [, callback])

d3.html(url [, callback])

d3.csv(url [, callback])

d3.tsv(url [, callback])

// callback(error,data)

The call either results in the resource being loaded or the request failing.

The type specific XHR request invokes the callback function with two arguments.

The error and the data.

The error, if it exists, is the error for why the request failed and the data variable will be undefined.

If no error was encountered, remember the HTTP response codes, then the data will contain an array of parsed data from the XHR request.

The data is parsed according to type specific D3.js parsers.


Depending on the size of the data and the strategies being used to implement loading, refreshing, splitting up and serving the data, the code will fall within one of these three areas:

  • server code
  • client code
  • callback code

The server code is used to dictate how to best serve the data.

The client code is used to dictate when, where, how and how often the AJAX calls are being made to get new data.

Once the data comes back, the callback code is used to visualize, stitch together, update, remove and/or overwrite prior data that was already there.

Depending on your project and visualization, all of these areas will come into contact.

Regardless of what code you end up writing, the big hurdle you will have to clear is what size is the data set that you are sending over.


Which brings us back to this d3 json XHR call.

// CAN, MEX & USA Feature Collection Object AJAX Call
d3.json('CAN_MEX_USA.geo.json', function(error, data) { 

    // Use the D3 pattern to draw the 3 paths
    svgContainer.selectAll("path")
        .data(data.features)
      .enter().append("path")
         .attr("d",  function(d, i) { return geoPath(d); });
});

In the last video we combined the Canada, Mexico and USA files into one file so that we only had to do one AJAX call.

This helped make the code cleaner as there was less repetitive code and it also made it so that we requested something from the server only once.

The thing that could make this code even better would be to decrease the GeoJSON file size.

This one thing would make our visualizations much faster to load and visualize.


GeoJSON => TopoJSON

When doing geography visualizations, we have to worry about file sizes, because of the level of intricate details that we can include in the data.

Which is where TopoJSON comes in.

TopoJSON is an extension of GeoJSON that encodes topology.

Rather than representing geometries discretely, TopoJSON geometries are stitched together from shared line segments.

Typical TopoJSON files are 80% smaller than their GeoJSON equivalents.

Which really helps us when thinking about, loading and using data in static and dynamic web pages.

In the next video, we will explore TopoJSON, how it is constructed and how to use it.


And with that, we have covered HTTP Request and Responses basics.


We also thought about data in static and dynamic web pages and what questions to ask when loading and using these types of data sets in static and dynamic web pages.

<< Back To D3 Screencast and Written Tutorials Index