Writing an AJAX Google Search Appliance Connector – Part 1: PHP
I started working a summer internship position with the University of Wisconsin-Whitewater in mid March. The university just purchased a Google Search Appliance and my job is the rollout of the appliance. Right now the university’s search is running on a Google Mini, which is quickly being outgrown.
So, we’re developing a web service to provide the search experience. We will be able to give different departments different front ends, exposing different bits of the result data as necessary for each department. This is all done by parsing the results data from the XML that the Google Search Appliance returns.
You can build your search interface using the GSA’s built-in Page Layout Helper, which allows you to write XLST style sheets that will transform how the result XML is displayed. However, this precludes you from using any fancy JavaScript effects like using AJAX requests to load pages. Displaying AJAX “loading” indicators is a good way to increase perceived response time. This is a feature the university wants to implement. We should also be able to move around in the results (view the next 10, previous 10, skip to a result page, etc) without actually refreshing the page.
The “web service” approach also allows us to more easily wrap the search results in a page which can contain other relevant information such as current event updates, etc.
So… what exactly is a Web Service?
The term “web service”, in this case, just means a page whose output is not intended for humans to read. This is essentially the basis upon which all AJAX is built and is what pretentious people actually mean when they say something is “Web 2.0”.
We want the web service to accept a few variables (we’ll just send them as GET parameters), and spit back the results as a set of JSON encoded data. Once we have the service up and running, we can build as many front-ends to it as we like.
Creating the PHP Class
The first step of this is creating a PHP class to get the XML results from the Search Appliance. A PHP class (we’ll call it “GSAConnector”) will query the Search Appliance with supplied search terms and starting index, as well as a “style” variable, which will let the object know what set of data is needed for the results. This way we can specify what we want in the result set, and not bother passing around a bunch of data that we don’t want displayed on the results page.
<?php
class GSAConnector {
const GOOGLE_BASE = "http://searchappliance.yoursite.com/search?output=xml_no_dtd&client=default_frontend&site=default_collection";
// create an array for the each different type
// of data the search appliance returns
public $searchTerms;
public $keyMatches;
public $spellingSuggestions;
public $results;
public $matchCount;
public $navigationURLs;
function __construct($query, $start, $style){
/*
* Query the search appliace for the search terms in $query and
* get the results starting at the number specified by $start.
*
* Return the result set specified by $style.
* "full" - return full set of result data (all options, attributes)
* "simple" (default) - return truncated set of results, omitting all
* data except title, snippet and URL for each.
*/
$encodedQuery = urlencode($query);
$googleQuery = self::GOOGLE_BASE . '&q=' . $query;
if(!empty($start)){
// if a start index was passed, append
// it to the search query
$googleQuery .= "&start=" . $start;
}
// get the result XML from the Search Appliance
$XMLResult = simplexml_load_file($googleQuery);
switch($style){
// return the result set requested by $style
case "full":
self::getFullResults($XMLResult);
break;
default:
self::getSimpleResults($XMLResult);
break;
}
// echo out the JSON data for the requested result set
if(floatval(substr(phpversion(), 0, 3)) >= 5.2){
echo json_encode($this);
} else {
require_once('Zend/Json.php');
echo Zend_Json::encode($this);
}
}
}
?>
You’ll notice that the GSAConnector object will call the Zend_Json::encode() method on itself if the version of PHP on the server is less than 5.2; this is because PHP 5.2 and greater has the json_encode() and json_decode() methods built-in. If you’re running a version earlier than PHP 5.2 you can download the Zend Framework from their website.
The json_encode() method will capture all the data contained in each of the public variables we declared in the class header. It will ignore any private variables. This allows us to write some functions to get the data from the XML and only return the parts we see fit.
So, we need to write the getFullResults() and getSimpleResults() methods:
private function getSimpleResults($xml){
/*
* Get a selected set of data from the $xml
*/
$this->searchTerms = self::getXMLSearchTerms($xml);
$this->keyMatches = self::getXMLKeyMatches($xml);
$this->spellingSuggestions = self::getXMLSpellingSuggestions($xml);
$this->matchCount = self::getXMLMatchCount($xml);
$this->navigationURLs = self::getXMLNavURLs($xml);
$this->results = self::getXMLSimpleResults($xml);
}
private function getFullResults($xml){
/*
* Get the full result set from the $xml
*/
$this->searchTerms = self::getXMLSearchTerms($xml);
$this->keyMatches = self::getXMLKeyMatches($xml);
$this->spellingSuggestions = self::getXMLSpellingSuggestions($xml);
$this->matchCount = self::getXMLMatchCount($xml);
$this->navigationURLs = self::getXMLNavURLs($xml);
$this->results = self::getXMLFullResults($xml);
}
Right now, the two are almost identical. We’ve mostly included them here for future expandability once departments start requesting certain features or certain pieces are found unnecessary. Each of these functions (getXMLSearchTerms(), getXMLKeymatches(), etc) now need to be defined. We’ll simply get the appropriate elements from the XML, throw them into an associative array, and return it. I won’t put them all here, but you can see that they’re app pretty similar:
private function getXMLNavURLs($xml){
/*
* Get the urls for the results page next and
* previous links from the $xml and return
* them in an array
*/
$navigationArray = array();
if(isset($xml->RES->NB->PU)){
$navigationArray["PU"] = strval($xml->RES->NB->PU);
}
if(isset($xml->RES->NB->NU)){
$navigationArray["NU"] = strval($xml->RES->NB->NU);
}
return $navigationArray;
}
private function getXMLSimpleResults($xml){
/*
* Get search results from $xml and return as
* an array of associative $result arrays,
* keeping only selected data
*/
$results = array();
if(isset($xml->RES)){
foreach($xml->RES->attributes() as $attr => $value){
$results[$attr] = strval($value);
}
$i = 0;
foreach($xml->RES->R as $result){
// get selected attributes for this result
$results[$i]["T"] = strval($result->T);
$results[$i]["S"] = strval($result->S);
$results[$i]["U"] = strval($result->U);
$i += 1;
}
}
return $results;
}
And so on, for each set of data in the results. If you’re not sure where I’m getting the $result->RES->NB->PU stuff, it’s just the hierarchy of elements defined by the Google Search Protocol Reference.
Once you’ve parsed all the data you want from the result set, the class constructor will echo the results of the json_encode() method. We have a web service! Web 2.0! Now we can all go buy turtlenecks and berets and meet with some venture capitalists (or perhaps meet with the venture capitalists first, turtlenecks can be expensive).
But wait… how are we going to call this thing?
Ah yes, we need some way to actually create objects of the class. I’ll just create a simple non-object-oriented PHP file (we’ll call it “search.php”) which will parse the “GET” variables from the request, sanitize them and then create an instance of the class.
Remember, the class will automatically output the JSON data after it is parsed from the XML, so all we need to do is create an object. Should be pretty simple:
<?php
if(isset($_GET['q']) && !empty($_GET['q'])) {
/*
* Get the search terms from $_GET, create a new GSAConnector
* object, and pass it the search terms.
*
* The GSAConnector object will automatically return
* JSON encoded result data upon being instantiated.
*/
$query = filter_var($_GET['q']);
$start = 0;
$style = "simple";
if(isset($_GET['start']) && !empty($_GET['start'])){
$start = filter_var($_GET['start']);
}
if(isset($_GET['style']) && !empty($_GET['style'])){
$style = filter_var($_GET['style']);
}
require_once('GSAConnector.php');
$connector = new GSAConnector($query, $start, $style);
} else {
// there were no search terms provided, don't
// bother calling the GSAConnector class
echo '{"error":"No search terms provided. ' .
'Enter some words to search for and try again."}';
}
?>
So essentially, here’s what we’re doing:
- Get the search terms from the request URL
- If there are no search terms, don’t even bother calling the class, just report back an error (as JSON-encoded data).
- If there were some search terms, set defaults for the starting index and style (since these are not required).
- Create the GSAConnector object.
So that should do it! Now we have a URL that we can call with a few GET style parameters that will create a GSAConnector object, which will query the Search Appliance, parse the results, and spit them back as JSON-encoded data.
Next time: Creating the AJAX interface (with JQuery!)
A superb look at this you have got, even as I don’t agree with everything that is explained I can see your case.
This post was super helpful. Your wrapper certainly came in handy. Can’t wait to see part 2.
This is a nice example of building a proxy to the GSA. However, the typical need to do so is not to build a web 2.0 interface.
First, the GSA is so fast that adding the addition layer causes it to be slower and increases the cost of implementation as you are simply duplicating functionality. Second, you do not need a PHP layer to transform the result set into JSON or create a Web 2.0 style service. Both can be achieved with XSLT on the appliance. Third, if you want a Web 2.0 type of service then you’ll need to encoded the response as JSONP. What you’ve done is simply transform the result set into JSON.
The general case for using a proxy is to shield the end user from the GSA. However, this has an adverse side effect when you want to use security trimming because you have to establish kerberos trusts or pass forms authentication cookies or redirects to SAML interfaces.
I’m interested in seeing part 2.
True, it’s really just a proxy that translates the results into JSON data. That was the route chosen by my department (I was literally told “write a PHP class that will translate the results from the Search Appliance into JSON”).
Doing it in the XSLT might actually be a better idea since it would undoubtedly be faster running right on the GSA. If you wanted to do a similar write up about how to accomplish that, I would be very interested!
One thing we wanted to minimize was links to old, possibly outdated pages. We’ve actually also turned caching off on our search applicance. Obviously, this is not the route that everyone would choose to go. It depends on your environment and needs.
Also, the real code we’re planning on deploying is more fleshed out and feature-complete than what I’m posting here. The actual search interface will “gracefully degrade” and the functionality will remain the same whether or not you have JavaScript enabled in your browser.
The only difference is that with JavaScript you will get the AJAX style interface where the results load without refreshing the page, which improves perceived response time. However, for these articles I tried to make the code as simple as possible so it will just be a simple web service with few features.
I may expand on it in some future “Part 3″ where I detail transforming it to a gracefully degrading interface and update with any other features we end up implementing. We’re discussing adding “related web searches” and related Twitter results (from pre-determined Twitter accounts) to our results pages.
There’s always more that can be done!
gives employ a good web-site decent Gives many thanks for the working hard to support myself