Static contentThis might be for archived materials. The update frequency can be adjusted to once every quarter or less often, for example.Note:If the WebSphere Portal Search Engine crawler is not able to crawl your site, the crawlers of Internet search engines such as Yahoo! and Google will not either.
54WebSphere Portal Best PracticesLarge pieces of content or documentsThe processing of large items such as PDF and zipped files can take a considerable amount of time. It is important to strike a balance about what to index and how often.The following steps describe how to create a search collection:1.Create the search collection.Consider the delta between features offered and the performance impact they might create. By using the non-default, advanced features, the processing time might increase from insignificant, for example, about 30%, to very significant. The factors relate to a number of parameters, such as pages per document, size of text, and size of vocabulary.However, users might appreciate certain features of the Portal Search Engine such as the summarizer.Therefore, you might not use every feature, but instead, do performance tests or at least do some calculations on it before you promise nice-to-have features.2.Define one or more content sources.For each group of a search collection, you will need to define a single content source entry and provide the adequate definitions and configuration to ensure that the best possible search collection is built.One of the definitions that you might want to adjust is the number of crawler threads. This can save system resources.3.Initiate the crawler process.A frequently asked question is: “It seems that the crawler always runs for hours even though it does not really look like that it is doing anything because the hard disk does not seem to be in use. Is this normal?” You might have run into the problem of not adequately separating your search collections. Proper filters might not have been applied, and the crawler might have run into GB-sized ZIP files, so now the document converter is trying to generate an HTML representation from it.In summary, the advantages you gain from partitioning your search collection include:Throughput of data significantly increases. Your focus should be on content that typically changes often.Static content, parts of your site that are not updated very often, can be set up to crawl with very long intervals. This saves system resources.Processing of large documents can be isolated within a separate process that is invoked (scheduled) independent of the more rapidly changing content.Search collections in a cluster scenarioWebSphere Portal Search Engine builds its search indexes on the file system of the server in V5.1. Therefore, there is no way of clustering this. You can install Portal Search Engine on one of your portal servers and then from within the cluster, configure WebSphere Portal to use that search service, thereby creating a remote search function. This does mean that you have a single point of failure as far as search is concerned. In the future, Portal Search
You've reached the end of your free preview.
Want to read all 150 pages?
Test, Intranet, IBM Lotus Notes, Enterprise portal, IBM WebSphere, Liferay