
Check the container! – Solr is moving

The Apache Solr search server and its integration into TYPO3 have always been part of the technology stack for the majority of our customer projects. Solr offers a powerful phonetic full-text search that can index and make searchable not only the editorial page content in TYPO3, but also news and glossary entries as well as any other data record types. Technically, the search consists of the actual search server, which is a Java application based on the Lucene library, and a TYPO3 extension based on the PHP library Solarium, which communicates with the search server and updates changes to the data in TYPO3. This ensures that the search index is always up to date. The overall architecture of the search feature is quite challenging for some people because it is often not entirely clear in conversations whether someone means the Apache Solr search server or the TYPO3 extension “solr” when they say “Solr.” However, this is particularly important when it comes to version numbers.
The language in which the content is available is, of course, crucial for the functioning of Apache Solr. A number of parameters that influence the behavior of the subsequent search function depend on this. For example, Solr has an automatic stemming function, which means that it reduces plural words such as “trees” to the singular form “tree.” It also takes phonetic similarity into account—so that a search for “Stephan” also finds entries with “Stefan,” for example. This is achieved by creating at least one so-called core for each language in Solr. This behaves similar to a MySQL database and stores the data in exactly one defined language, each with its own phonetic rules and a stop word list.
We also create separate cores for TYPO3 installations with multiple sites, even if the language is otherwise the same. This has advantages when it comes to non-public areas such as style guides or playgrounds.

Solr and Mittwald
Many of our projects are operated by our partner Mittwald. From the outset, we have relied on the hosted Solr service, which provided the operation of the Solr search server on Mittwald's infrastructure. Solr runs on Mittwald's older (non-cloud-based) infrastructure. Although existing services remain available, it is gradually being phased out and new plans can no longer be booked there. Therefore, a new solution was needed.
However, our customer projects at Mittwald run on the newer cloud infrastructure, which can be managed via the mStudio management interface. One of the newer features now available there is the option of container hosting. This allows a project to start Docker containers and access them. Since Solr is available as a Docker container in principle, it made sense for us to run Solr as a Docker container alongside the projects in order to continue providing the functionality. This also has the advantage that the search server is operated closer to the application from a technical perspective, which reduces latency during access and promises slightly higher performance overall. In addition, it is also true that booking separate Solr packages has always been associated with additional costs, which are significantly lower in direct comparison with container hosting.
Our general approach
The idea of moving the Solr instances was thus born. The next step was to develop a way to recreate the various Solr plans as containers. Subsequently, each affected TYPO3 installation had to be given the new URL for the Solr server and new credentials, and then completely reindexed once so that the new Solr search server could be filled with content. We took the opportunity to upgrade all Solr instances to the latest software version at the same time.
Operating Solr as a container involves a little extra work. For example, we have to make sure that the configuration required by TYPO3 is loaded into the search server. In addition, the required Solr cores must be created. Based on the official Solr Docker image, several XML files with the field configuration and the basic configuration of Solr itself are stored in its data directory.
We are already successfully using Terraform to automatically manage part of our infrastructure. Fortunately, there is a Terraform provider for mStudio that allows us to programmatically create resources there. This will enable us to later place the Solr administration interface on internally used host names and make it accessible, including authentication. This approach provides us with infrastructure as code and creates a solution that we can continue to use in the long term, even when new site languages – and thus Solr cores – are added.
Mittwald itself has recognized that there is a need to create Solr instances programmatically on the cloud infrastructure. Therefore, there is a corresponding Terraform module that shows a possible way to implement this with Terraform. The module starts a container based on the official Solr Docker image and then supplements the configuration set from the TYPO3 extension solr repository. This way, it can also be updated later. As a result, a typical project configuration now looks like this:
locals {
mittwald_project_id = "02159b81-c0bb-4f62-ae57-8dd2e6bd5113"
solr_version = "9.9"
solr_heap = "1g"
solr_hostname = "solr.live.mfc.invalid"
solr_users = {
cs = "9PxfZq9si08ebsxB1pwOylsB6uw3nCAkKz4dlKPzfws= ZGtkNHhyNWE5czAxN2xhZA=="
mp = "2/JqksIj3gA3lXgA7E9BuW9D4t8Ki+x/yWLRUGcFFBY= bjljYzdzbnYwNGVxcTB5ag=="
sk = "wE0pHBx3d0osQUP2i2LSDiGeqfJhSuc1viaWB5+y6Ao= cWIwZTY3cmNkbDh2aXB1Ng=="
mfd = "Slw0Dts2bGMS7hKEyuXSxYMpahP8eaAF/3wAXyluwiQ= ZHJlcHlhcGwwcHlpZXBvcQ=="
}
solr_cores = {
"www_marketing_factory_de_de_de" = {
language = "german"
}
"www_marketing_factory_com_en_us" = {
language = "english"
}
"sg_marketing_factory_de_de_de" = {
language = "german"
}
"sg_marketing_factory_com_en_us" = {
language = "english"
}
}
}
It defines the Solr version, the Solr cores to be set up, including their language, and the users who should have access. This includes one at the end that TYPO3 itself will use. At the beginning, you specify the Mittwald project in which the whole thing should be created. The project ID is stored for this purpose—a UUID that uniquely describes each project in the cloud infrastructure.
Security considerations
One aspect must not be forgotten: security. The Solr instances previously provided by Mittwald were operated behind a reverse proxy, which also performed authentication via HTTP Basic. The Solr container itself does not currently provide this functionality. TYPO3 later communicates with the container via an internal, private network, so this is not an issue. However, you will usually want to access the Solr admin from the outside as well. Since there is no built-in option in the cloud infrastructure to enable authentication for domains (Mittwald delegates this task to the underlying application), we have enabled authentication in Solr.
This is done using a security.json
file, which is stored in the Solr data directory. This would even allow fine-grained role and rights management. However, since we do not need this level of complexity, we have decided not to use it. The necessary user passwords are stored as hashes. There are online tools that can generate the required format, including salt, locally via JavaScript in the browser, which are very useful. Each user can then hash their own password and commit it to the configuration repository, ensuring confidentiality is maintained.
In the example, this file looks as follows (the hashes are not real, of course, and have been newly generated for this example 😉):
{
"authentication": {
"blockUnknown": true,
"class": "solr.BasicAuthPlugin",
"credentials": {
"cs": "9PxfZq9si08ebsxB1pwOylsB6uw3nCAkKz4dlKPzfws= ZGtkNHhyNWE5czAxN2xhZA==",
"mfd": "Slw0Dts2bGMS7hKEyuXSxYMpahP8eaAF/3wAXyluwiQ= ZHJlcHlhcGwwcHlpZXBvcQ==",
"mp": "2/JqksIj3gA3lXgA7E9BuW9D4t8Ki+x/yWLRUGcFFBY= bjljYzdzbnYwNGVxcTB5ag==",
"sk": "wE0pHBx3d0osQUP2i2LSDiGeqfJhSuc1viaWB5+y6Ao= cWIwZTY3cmNkbDh2aXB1Ng=="
},
"forwardCredentials": false,
"realm": "Solr administration",
"scheme": "basic"
},
"authorization": {
"class": "solr.RuleBasedAuthorizationPlugin",
"permissions": [
{
"name": "security-edit",
"role": "admin"
}
],
"user-role": {
"cs": "admin",
"mfd": "admin",
"mp": "admin",
"sk": "admin"
}
}
}
Verdict
All previous Solr plans have now been migrated, and the migration went smoothly. The new setup allows us to run the services related to a project together with the project itself on the same infrastructure. The use of Terraform has the additional advantage that we can also use Terraform to perform necessary tasks such as entering IP addresses for access to Solr Admin, provided that there is a Terraform provider for the respective domain registrar. This further reduces manual work and makes the overall process less error prone.
With container hosting, we can now move other things that are currently being operated around the projects, in addition to Solr, as well - provided they are available as Docker images. We already have some ideas in this regard, so stay tuned!
Please feel free to share this article.
Comments
No comments yet.