Nextcloud supports a search via the web interface and in the clients. However, only file names are compared and not searched for in file contents. However, there is the option of setting up a full-text search.
Installing Elasticsearch in Ubuntu/Debian
Java runtime environment
Nextcloud’s full-text search is bassed on Elasticsearch, which needs to be installed independently. Elasticsearch is a Java-based search engine, so the first step is to ensure that Java is available. You can check if Java is already installed with the following command:
As result you should get an output similar to this:
openjdk version "11.0.13" 2021-10-19 OpenJDK Runtime Environment (build 11.0.13+8-Ubuntu-0ubuntu1.18.04) OpenJDK 64-Bit Server VM (build 11.0.13+8-Ubuntu-0ubuntu1.18.04, mixed mode, sharing)
If you get a message like
Command 'java' not found instead, you have to install a Java runtime environment, i.e. in Ubuntu Linux as following:
apt install openjdk-11-jre
After that, add the repository for Elasticsearch and install Elasticsearch as following:
apt install apt-transport-https ca-certificates wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | apt-key add - echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | tee -a /etc/apt/sources.list.d/elasticsearch7.list apt update apt install elasticsearch -y
Before you start Elasticsearch, you should definitely adjust the configuration to limit the size of the heap – otherwise Elasticsearch can use all the free memory and other services can be affected.
For this I added the following entry in the file
/etc/default/elasticsearch to limit the heap size to 4 GB:
Note: the server used has about 32 GB RAM. Maybe you need to adjust that to a lower value if you do not have that much memory available. The more memory that can be used, the more effectively Elasticsearch can work, since less data then has to be reloaded during operation.
Some instructions also point out that the IP address for incoming connections should be set to
127.0.0.1 in the
/etc/elasticsearch/elasticsearch.yml file. However, this is not necessary for Elasticsearch 7, since Elasticsearch can only be addressed locally without specifying an IP address. To be on the safe side, you should at least check the setting and, if necessary, comment out
network.host or set it to
# ---------------------------------- Network ----------------------------------- # # By default Elasticsearch is only accessible on localhost. Set a different # address here to expose this node on the network: # #network.host: 192.168.0.1
To be able to search the content of PDF documents as well you need to install an additional plugin:
/usr/share/elasticsearch/bin/elasticsearch-plugin install ingest-attachment
Tesseract for OCR
In addition to text documents, images can also be searched for readable text. The “Tesseract” tool is required for this, which can be installed as follows with support for German and English:
apt install tesseract-ocr tesseract-ocr-deu tesseract-ocr-eng
Important: OCR is a time-consuming process. If you have a lot of pictures in your Nextcloud, the first build of the search index will take a long time!
Setting up Elasticsearch as service
After all preparations are completed, you can activate Elasticsearch as a service as following:
systemctl daemon-reload systemctl enable elasticsearch systemctl start elasticsearch
Starting the service may take a while, in my case it was about 20 seconds.
Now Elasticsearch is available for full-text search in text documents including PDF and Office documents like Word and LibreOffice/OpenOffice.
Setting up full-text search in Nextcloud
After Elasticsearch and, if applicable, Tesseract are installed, install the following apps in Nextcloud:
And the following app if you want to use Tesseract to search for text in images:
Configuring the full text search
The configuration can be reached via the administration. The following information must be added to it.
In the section “Elasticsearch”
- Address of the servlet:
- Index: Name of the index, for example the domain name of your Nextcloud
- Analyzer tokenizer:
Changing the tokenizer is usually not needed. To see what tokenizers are available and how they work see the documentation at Elasticsearch.
In the section “Files”
Here you can activate the inclusion of PDF and Office documents in the index and, if necessary, adjust the maximum file size up to which documents are included in the index.
In the section “Files – Tesseract OCR”
If you also instaled Tesseract, you can basically activate OCR here. For the languages, enter all the languages that you have installed for Tesseract, separated by a comma – e.g.
Exclude folders from the search
To exclude folders from the search, just add a file named
.noindex to these folders. Also see https://help.nextcloud.com/t/how-to-exclude-a-folder-from-indexing/35318/2.
Generating the search index
The first structure of the search index is done in a console with the following command in the main directory of Nextcloud:
php occ fulltextsearch:index
Depending on the amount of data available, this process can take several hours. Therefore, when accessing the server via SSH, it makes sense to use tools like
tmux so that you can run the command in the background without having to keep the connection to the server open all the time.
Activating Cron in Nextcloud
When the process is complete, future new files or file changes will be automatically added to the index as part of Nextcloud’s cron job. To do this, however, it must also run via cron – AJAX or Webcron is not sufficient for this!
See the documentation of Nextcloud how to do set up cron.
Testing the search
After setting up and building the search index, you can test the search by clicking the search icon in the web interface and entering a term that you know appears in your documents.
One or more entries under “Full-text search” should then appear in the list of results:
Im Android-Client wird die Volltextsuche dann ebenfalls unterstützt:
Update the PDF plugin when updating Elasticsearch
Elasticsearch is also updated as part of regular updates with
apt update and
apt upgrade. It can happen that the service can no longer be used after the update because the plugin for the PDF import no longer matches the server version.
In this case you need to remove and install the plugin again:
/usr/share/elasticsearch/bin/elasticsearch-plugin remove ingest-attachment /usr/share/elasticsearch/bin/elasticsearch-plugin install ingest-attachment
After that you can restart Elasticsearch:
systemctl restart elasticsearch
Using with Nextcloud 25
Currently (as of mid-October 2022), the apps for full-text search are not yet marked as compatible with Nextcloud 25. However in my experience it works without any problems if you manually enable the apps and then activate them.