Search Solutions and Tutorials 2022

Innovations in Search & Information Retrieval.

Search Solutions is the BCS Information Retrieval Specialist Group’s annual event focused on practitioner issues in the arena of search and information retrieval.

Search Solutions consists of two parts: a tutorial day and a conference day.

We bring together practitioners, researchers, analysts and end users to discuss the latest developments in the information retrieval (IR) community and to share insights between research and practice.

Tutorials – 22 November

Tutorials are for both full day (5-6 hours including breaks and lunch) and half day (2-3 hours including breaks). The tutorials took place on Tuesday 22nd November 2022 at the BCS offices in London.

Tutorial 1 - Full day

IR From Bag-of-words to BERT and Beyond through Practical Experiments

Tutor(s):

Sean MacAvaney, University of Glasgow, Email: sean.macavaney@glasgow.ac.uk
Craig Macdonald, University of Glasgow, Email: craig.macdonald@glasgow.ac.uk
Nicola Tonellotto, University of Pisa, Email: nicola.tonellotto@unipi.it

Introduction:

In this tutorial, you will build up their knowledge of information retrieval from the basics up to the latest BERT-based techniques. Moreover, hands-on exercises will give give you practical experience using these techniques. By the end of the tutorial, you will be familiar with the latest techniques, including neural language models for re-ranking, learned sparse retrieval, and dense retrieval.

Schedule:

10:00 - 11:30 Part 1 Presentation (a): Indexing, Retrieval, Evaluation
11:00 - 11:15 Morning break
11:15 - 11:45 Part 1 Presentation (b): Learning-to-rank
11:45 - 12:15 Part 1 Lab
12:15 - 13:00 Lunch break
13:00 - 14:00 Part 2 Presentation: Neural re-ranking
14:00 - 14:30 Part 2 Lab
14:30 - 14:45 Afternoon break
14:45 - 15:45 Part 3 Presentation: Learned sparse retrieval, Dense retrieval
15:45 - 16:15 Part 3 Lab

Tutorial Logistics/Materials:

Attendees are required to bring their own computers/laptops for the lab component. All materials (e.g., datasets, models) are automatically downloaded. The preferred platform for running the labs is Google Colab, though participants may also run the exercises locally if they prefer. Materials (slides and Colab notebooks) will be accessible to attendees in a public GitHub repository*.
* Examples from previous iterations of the tutorial: https://github.com/terrier-org/ecir2021tutorial, https://github.com/terrier-org/cikm2021tutorial

Tutorial 2 - AM

Approaching Neural Search with Apache Solr and Open-source technologies

Tutor(s):

Alessandro Benedetti, CEO @ Sease Ltd, Apache Lucene/Solr Committer, Apache Solr PMC Member, Email: a.benedetti@sease.io

Introduction:

Please join us as to explore this exciting new Apache Solr feature and learn how you can leverage it to improve your search experience!

Schedule:

9:00 - 9:20 - Introduction to Semantic Search Problems (vocabulary mismatch problem, semantic similarity)
9:20 - 9:40 - From Text to Vectors (Sparse vs Dense vector representation)
9:40 - 10:10 - how Approximate Nearest Neighbor (ANN) approaches work, with a focus on Hierarchical Navigable Small World Graph (HNSW)
10:10 - 10:40 - how the Apache Lucene implementation works
10:40 - 11:10 - how the Apache Solr implementation works, with the new field type and query parser introduced
11:10 - 11:30 - Break
11:30 - 12:00 - how to run KNN queries and how to use it to rerank a first-stage pass
12:00 - 12:35 - how to generate vectors from text and integrate large language models with Apache Solr"
12:35 - 13:05 - Limitations and how to mitigate them
13:05 - 13:20 - Future Works

Tutorial logistics/materials:

Attendees are required to bring their own computers/laptops for the lab component. Slides and code snippets will be provided.

Tutorial 3 - PM

Simplifying NLP researchers work with Datafari Open Source

Tutor(s):

Julien Massiera, France Labs, Email: julien.massiera@francelabs.com
Cedric Ulmer, France Labs, Email: cedric.ulmer@francelabs.com

Introduction:

NLP researchers need to manipulate text. Their aim is to find the best way to analyse it. But quite often, they need to address the time-consuming part where they extract the text out of source documents. This is useless for their research, but necessary. Then, in case they work for instance with machine learning algorithms, they need to test their algorithm on actual data. Again, this is time consuming. Then comes Datafari into play. Datafari is among the few available open-source Enterprise Search solutions. It covers the necessary steps, from document sources crawling, to indexing and searching, including text extraction. Thanks to this, attendees, in particular NLP researchers, will have an open-source toolbox to simplify their work and focus on their actual research.

Schedule:

14:00 - 14:30 Understanding Datafari, its architecture and its components
14:30 - 15:00 Installing Datafari
15:00 - 16:00 Going through use case A: using Datafari to easily extract text from multiple sources and multiple formats, and retrieving the output as either raw text files or within a Solr search index
16:00 - 16:20 Break
16:20 - 17:20 Going through use case B: using Datafari to add an NLP step in the documents crawling pipeline and retrieving the output entities as a new field in a Solr search index.
17:20 - 17:45 Wrap up and questions

Tutorial logistics/materials:

Attendees are required to bring their own computers/laptops for the lab component.

Option 1: You can do the full tutorial on your own laptop if you are able to run a linux OS (either directly on the machine or through a VM or a docker container). They must have min 12GB of RAM dedicated to Datafari, a min of 1 GHZ CPU, and at least 20GB of disk space available, if possible using an SSD

Option 2: You can do the tutorial on a remote linux system that France Labs will be hosting, using your laptop to connect to it. For this, you will need an internet connectivity, and the possibility to connect via SSH to a remote system (natively included in linux systems, requiring for instance putty or mobaxterm on windows systems).

Tutorial 4 - Full day

Diverse Approaches to Systematic Searching

Tutor(s):

Dr Farhad Shokraneh
Institute of Health Informatics, University College London, London, UK
Centre for Neuromuscular Diseases, National Hospital for Neurology and Neurosurgery, UCLH, London, UK
King's Technology Evaluation Centre, King's College London, London, UK
Division of Psychiatry and Applied Psychology, University of Nottingham, UK
School of Medicine, University of Central Lancashire, Preston, UK
Systematic Review Consultants, Nottingham, UK
Email: FarhadShokraneh@gmail.com
Isla Kuhn - Head of Medical Library Services at the University of Cambridge

Introduction:

Searching for literature review purposes could follow different steps, methods, and approaches depending on the complexity of the topic, availability of time, human, machine and information resources, and the type and purpose of the review. Since there are several reviews and evidence synthesis types, one size does not fit all, and the searchers need to tailor the search methods and approaches. The focused context of the tutorial will be biomedical and health sciences.

Introducing the search pyramid concept for the first time, this tutorial will show the reverse progress of search systems to simplify the search string development. Furthermore, a deeper dive into the approaches that users take to start, continue and finish their searches will be discussed to classify the existing approaches, including, but not limited to, minimalist vs maximalist approaches, translational reductions, internal and external validation, pre-search, in-search, and post-search filtering, structural performance tests, peer-review, historical methods, and scoping methods. Each of these methods will be discussed to reveal their best use cases, advantages, disadvantages, and road toward their future developments. The participants will have a chance to practice most of the approaches during the tutorial. There is no pre-requisite for this tutorial; anyone can benefit from the content.

Schedule:

10:00 – 10:20 Introduction to systematic reviews, evidence synthesis, and systematic search
10:20 – 10:45 Typology of evidence synthesis
10:45 – 11:00 Questions and Short Break
11:00 – 11:30 The process of developing the search methods
11:30 – 11:50 The steps to developing a search strategy
11:50 – 12:00 Short Break
12:00 – 13:00 Practice 1: Developing a search strategy
13:00 – 14:00 Lunch Break
14:00 – 14:20 Analysing problem scenarios to develop search methods
14:20 – 14:40 Information needs that trigger a specific approach to systematic searching
14:40 – 14:50 Search pyramid
14:50 – 15:00 Short Break
15:00 – 15:30 Diversity of search functions and tools in databases
15:30 – 16:00 Practice 2: Matching the scenario to the approaches
16:00 – 16:30 Discussions and Closing

Tutorial logistics/materials

Attendees are required to bring their own computers/laptops for the lab component. The practice materials and slides will be printed and presented to the participants.

Search Solutions conference 2022 – 23 November

Session 1: The search experience: Focus on the users

10:00 - 10:15 Introduction
10:15 - 10:45 Natasha den Dekker (LexisNexis) - How to conduct empathetic user research to test the search experience of users?
10:45 - 11:15 Amy Walduck (State Library of Queensland) - The Topography of Searching: Visualising search data
11:15 - 11:45 Break

Session 2: Beyond keyword search: Semantic/conversational/audio search

11:45 - 12:15 Brammert Ottens (Spotify) - Finding the Right Audio Content for You
12:15 - 12:45 Mohamed Yahya (Bloomberg) - Taking Question Answering from Research Prototype to Product
12:45 - 13:15 Filip Radlinski (Google) - Challenges with Really Understanding Natural Language in Conversational Recommendation
13:15 - 14:15 Lunch

Session 3: Search with an impact: Searching health-related information

14:15 - 14:45 Farhad Shokraneh (Institute of Health Informatics, University College London) - The Futures of Systematic Searching
14:45 - 15:15 Gavin Moore & Andrew Doyle (University Hospitals Coventry & Warwickshire NHS Trust) - A Programmable Search – A Solution to Finding Guidelines and Patient Information?
15:15 - 15:30 Break

Session 4: A world beyond web search: Enterprise search

15:30 - 16:00 Julien Massiera & Cedric Ulmer (France Labs) - Combining Spacy with Datafari Community Edition to enable semantic Enterprise Search
16:00 - 16:30 Phil Lewis (Pureinsights) - Practical Applications of Knowledge Graphs and AI in Search
16:30 - 17:00 Lightning Talks (feel free to step up and present YOUR five-minute talk)
17:00 - 17:30 Our traditional fishbowl session
17:30 - 17:45 BCS Search Industry Awards
17:45 - Drinks / BCS-IRSG AGM starts at 18:00

Organisers

Ingo Frommholz
Frank Hopfgartner
Udo Kruschwitz
Tony Russell-Rose
Martin White
Haiming Liu (tutorials chair)

Contact

For further details, contact irsg@bcs.org.uk.