leoraw houses banner for website

Technical Posts with Code Exploring Web Development

The Challenges of a Tech Talk

It is a great feeling to be asked to give a tech talk on the topic of one’s choice. However, one faces certain challenges. One must prepare a presentation. The talk should appeal to both technical and non-technical audiences. One might know quite a bit on the technical aspects of the talk. But the non-technical audience will probably be the ones who can best make use of the content.

My tech talk will be about search, as I have spent a fair amount of time figuring out how to get content records online that the user needs to find. Here is a list of some of the issues that I tackled:

  1. Content records need to be of a certain limited size. But many content fields are longer.
  2. What terms are searchable? What if editors change their minds?
  3. What if there are many PDFs on the site, but one only wants to index a few select ones?
  4. What kind of filters does one want for the site? What if the site is not yet organized in this fashion?
  5. What if the search company charges for each click in the input box? Is there a way to lower the costs?
  6. What about typos: how many characters is considered a typo? If one includes typos of one character or more, does one get a lot of false results? Would exact match be a better option?
  7. Some alternatives feature facets. Are these easy to use? Are there bugs associated with commonly used Drupal modules? Do filters and facets in ReactJS serve the user faster and in a clearer way?

Why not use Drupal search straight out of the box? Here are a few reasons: 1) not enough results, 2) not fast, 3) no type ahead — type ahead is when you type in an input box, and the app gives suggestions as you type. Finally, 4) with certain software, you can build the front end with a library such as ReactJS, which gives you the above benefits plus more.

What software is available that does have these features? The company that I used at clients was Algolia. When they were a new company, even the head developer himself often answered questions. However, now Algolia is pricey for popular sites, and most support questions are answered by bots. A new company offering search is called Typesense. I am hoping they will provide a new option at lower costs with better support. Both Algolia and Typesense still offer a free pricing level.

There is no way I could cover all these topics in one blog post or in a tech talk. I plan to write more posts detailing some of these issues and the solutions that I found. For this post I will talk about a tool called Simple Html Dom Parser (voku/simple_html_dom) that I used in order to capture pdf links from a web page. If you look in the examples directory of Simple Html Dom Parser, you will find commands that one can alter to get the results one wants, such as capturing links into an array that can be used by a custom module. I took the following PHP code:

$html = HtmlDomParser::str_get_html($str);
foreach ($html->find('ul') as $ul) {
    foreach ($ul->find('li') as $li) {
        echo $li->innertext . '<br>.';
    }
}

Basically, this is looking for all the unordered lists on the page. Then it looks for the inner text of each list item. It “echo”s or prints that list item on the screen.

To fetch the HTML from the page, I used Guzzle.

    $client = new \GuzzleHttp\Client();
    $response = $client->request('GET', 'mydomain.com');
    $htmlContent = $response->getBody()->getContents();
    $dom = HtmlDomParser::str_get_html($htmlContent);

Once I had that $dom variable, I could then parse it. I converted the HtmlDomParser code that I found to look like this:

    foreach ($dom->find('.field--item') as $pdf) {
        foreach ($pdf->find('a') as $anchor) {
           if(str_contains($anchor->href, '/document')) {
             // Put $anchor->href in an array to be used later.
           }
        }
   }

This code looks for elements with the class of “field–item.” In Drupal , “field–item” is a common class, used to divide up sections of the HTML. You may get more than you need; in that case, make your requests more specific. Using the PHP parser tool, I was able to populate an array with links and titles that I used to create records for the Algolia server.

What tools do I need for the talk? I need to set up an instance of Drupal. Perhaps I will use Drupal Starshot: “Starshot aims to build the new default download of Drupal. A package built on Drupal core, including refined common features from the contributed project ecosystem to create a great user experience out of the box.” For search, it will be nice to use both Algolia and Typesense (free levels) so I can compare the two. Or maybe I will just use Typesense, as I am already quite familiar with Algolia. Stay tuned for more adventures in preparing a tech talk.