ANET.at HomepageSearchEngine (HSE) 3.4+ (released on November 9, 2002) (c) 1999-2002, ANET.at Core software bundle required for all available version types, including the free time-limited Trial version of the Pro edition (expires on January 31, 2003). Homepage: http://www.HomepageSearchEngine.com/ (English) or http://www.HomepageSearchEngine.de/ (German/Deutsch) R E F E R E N C E M A N U A L In the current package, support for the following 24 languages is included: Arabic, simplified Chinese, traditional Chinese, Czech, Danish, Dutch, English, Finnish, French, German, Greek, Hungarian, Italian, Japanese, Norwegian, Polish, Portuguese, Romanian, Russian, (Latin) Serbian, Spanish, Swedish, Thai and Turkish ________ Contents 1. About this software 2. System requirements 3. Which package do I need? 4. File structure of this package 5. Quick install guide (for the advanced or impatient user) 6. Installation Manual 1. Executable file ("HomepageSearchEngine.exe" on Windows or "HomepageSearchEngine.cgi" on Unix) and libraries 2. Configuration file ("hse.ini") 3. IMPORTANT - Admin Area, Creating an Admin account and the Users file 4. Central Style Sheet definition file ("/hse/HomepageSearchEngine.css") 5. Static HTML template file ("hse_template.html") 6. Dynamic HTML template file 7. Language files ("hse_lang.txt" and "hse_help.txt") 8. Language and Configuration sub directories / delivery parameters ("lang" and "conf") 9. Setting the Language Locale - the locale-enabled HomepageSearchEngine Executable 10. Shell Executable: creating file-lists, indexes and using other tools (Pro edition only) 11. IMPORTANT - Testing your installation: search for "list:files" 12. Excluding specific sections within HTML files from being searched 13. Options to call the search engine 14. Optional turn from the Trial to the Freeware version with the public key ("hse_key.cgi") 7. Special Shell Executable features and using the cronjob script (Pro edition only) 1. Spidering and URL Grabbing: Searching dynamic sites or the content of any URLs 2. Searching different websites hosted on the same computer 8. Updating from a previous version 1. Updating from v3.1 2. Updating from v3.2 3. Updating from v3.21 4. Updating from v3.3 or 3.31 5. Updating from v3.32 6. Updating from v3.33 7. Updating from v3.34 8. Updating from v3.35 9. Updating from v3.36 10. Updating from v3.37 11. Updating from v3.4 12. Clean installation and updating from versions earlier than 3.1 9. Debugging 10. Known issues 11. To-Do's 1. Internationalisation 2. New features 12. Support 13. Credits 14. History of version changes ("change log") 15. License agreement ______________________ 1. About this software This software is intented to search the real content of HTML pages (both static and dynamically generated ones) and all other text files including special formats such as RTF. The resulting output is in pure valid XHTML 1.0. The found files or any other URL's content can be viewed with all matches highlighted in a desired style. The main purpose is to search medium sized websites on the inter- or intranet, but it may also be used to search documentations or other content written in HTML on your local harddisk or even on a CD-ROM. ______________________ 2. System requirements Webspace on a Win32- or supported Unix-system with the right to run your own CGI programs. If the webspace is remotely hosted, access via FTP/SFTP is required for installation. The webspace can also be on the local harddisk or on a CD-ROM when a webserver software and a webbrowser is or will be installed. On large websites and for optimal use of the indexing functionality, shell access (usually via Telnet/SSH) is recommended. The resource consumption for basic actions of HSE lies at about 3-4 MB memory. Once HSE is installed, you can determine a comparable value of its memory usage by executing it with the 'memory' command. Executing ./HomepageSearchEngine.cgi memory via the web based Admin Console, running on a GNU/Linux 2.4 (i686) system, resulted in 2740 KB for the current version. ___________________________ 3. Which package do I need? Be sure to download the package containing the latest available version supporting your target platform from http://www.HomepageSearchEngine.com/download_en.phtml Packages for both Windows and Unix are available, which only differ in the executable file and its associated libraries (within the "cgi-bin/hse" sub directory). If you want to use the search engine on Unix, it is strongly recommended to first run the "platform.cgi" script found in the "cgi-bin/platform" sub directory of this package or at http://www.HomepageSearchEngine.com/_download/platform.tgz The distributed package is called "HSEn.n_Platform.ext" HSE............."HomepageSearchEngine" n.n.............version number (eg. "3.4+") Platform........platform the webserver is running on (target platform). Currently, the following 10 platforms are supported: Windows platforms: ------------------ "Win32".......Windows 32bit on Intel x86 processors (Microsoft Windows XP, 2000, NT, Me, 98, 95) Unix and compatible platforms: ------------------------------ "Linux".......GNU/Linux (aka "Linux") v2.x on Intel x86 processors (i386/i586/i686) (all current glibc 2 based distributions like Caldera OpenLinux, Debian, Mandrake, Red Hat, Slackware, Sun RaQ3 or higher boxes, SuSE, TurboLinux, XandrOS) Some older Linux distributions may need the additional "Linux-old" package (including all libc 5 based distributions - the platform.cgi script will determine that). "SunOS".......Sun Solaris Sun SunOS v5.5 or higher including Solaris v2.5 or higher on Sun SPARC (sun4x series) processors "FreeBSD".....FreeBSD v3.x or higher on Intel x86 processors "BSD-OS"......BSDi BSD/OS v3.x or higher on Intel x86 processors "HP-UX".......HP HP-UX (Hewlett Packard) v10.x or higher on HP PA-RISC (9000) processors "AIX".........IBM AIX (International Business Machines) v4.x or higher on IBM processors Axx "DEC-OSF1"....DEC OSF/1 (Digital Equipment Corporation) including Digital UNIX and (currently called) Compaq Tru64 UNIX v4.0 or higher on DEC alpha processors "IRIX"........SGI IRIX64 (Silicon Graphics, Inc.) v6.5 or higher on SGI IP2x (IP27 and compatible) processors (eg. Rapidsite systems) "MacOSX"......Apple MacOS X v10.x (with its Darwin core system) on Power Macintosh (PPC) processors ext.............Filename extension: for Windows target platforms: ----------------------------- "zip".....ZIP-compressed file You can unpack it using WinZip or a similar common program (WinRAR, 7-Zip etc.). for Unix target platforms: -------------------------- "tgz".....TapeArchive (tar) format GNU-Zip (gzip) compressed file If you work on a Windows machine you can also unpack it using WinZip. You can also unpack it directly on the Unix machine by typing the following commands: gzip -d HSEn.n_Platform.tgz tar -xvf HSEn.n_Platform.tar (where "n.n" and "Platform" have to be replaced by the real strings). Under MacOS, StuffIt Expander can be used to unpack the package. Make sure to unpack the package including sub directories, not cutting long file names and preserving the filename's case. Currently, the following packages are available: "HSE3.4+_Win32.zip".......HomepageSearchEngine version 3.4+ for Windows 32bit platforms "HSE3.4+_Linux.tgz".......HomepageSearchEngine version 3.4+ for GNU/Linux platforms "HSE3.4+_Linux-old.tgz"...HomepageSearchEngine version 3.4+ add-on for old GNU/Linux platforms "HSE3.4+_SunOS.tgz".......HomepageSearchEngine version 3.4+ for Sun Solaris platforms "HSE3.4+_FreeBSD.tgz".....HomepageSearchEngine version 3.4+ for FreeBSD platforms "HSE3.4+_BSD-OS.tgz"......HomepageSearchEngine version 3.4+ for BSDi BSD/OS platforms "HSE3.4+_HP-UX.tgz".......HomepageSearchEngine version 3.4+ for HP HP-UX platforms "HSE3.4+_AIX.tgz".........HomepageSearchEngine version 3.4+ for IBM AIX platforms "HSE3.4+_DEC-OSF1.tgz"....HomepageSearchEngine version 3.4+ for DEC OSF/1 platforms "HSE3.4+_IRIX.tgz"........HomepageSearchEngine version 3.4+ for SGI IRIX64 platforms "HSE3.4+_MacOSX.tgz"......HomepageSearchEngine version 3.4+ for Apple MacOS X platforms Note that support for the platforms "GNU/Linux-mips", "OpenBSD" and "NetBSD" has been discontinued as of version 3.4+ since these platforms are rarely used for running a web server. If you need a package for one of these platforms you have to use version 3.4. The latest packages for the most common platforms are also always available at the following direct download URLs: http://www.HomepageSearchEngine.com/_download/HSE_Win32.zip for Windows 32bit http://www.HomepageSearchEngine.com/_download/HSE_Linux.tgz for GNU/Linux http://www.HomepageSearchEngine.com/_download/HSE_SunOS.tgz for Sun Solaris http://www.HomepageSearchEngine.com/_download/HSE_FreeBSD.tgz for FreeBSD _________________________________ 4. File structure of this package There are 3 different main directories which contents goes into different locations on your server machine that reflect a different nature: 1. the webserver's script (cgi-bin) directory 2. the webserver's document root directory 3. your home directory (outside a directory accessable by the webserver) 1. + cgi-bin CGI applications. To be put into the webserver's script (cgi-bin) directory. | | | + hse HSE's program (main) directory - corresponds to the URL "/cgi-bin/hse" | | | + platform Platform Detector; an optional tool to find out which package you need on a Unix platform | 2. + htdocs HTML (web) documents. To be put into the webserver's document root directory. | | | + hse HSE's web documents directory - corresponds to the URL "/hse" | 3. + tools Tools. Can be put into the user's home directory (outside a directory accessable by the webserver) | + hse HSE's non-web directory The HSE's program directory ("cgi-bin/hse") contains the platform specific executable file and some associated libraries as well as a bundle of platform independent files. The filename extension of the executable file in the Windows package is ".exe" and in the Unix package ".cgi.bin" (to be renamed to ".cgi" once residing on the server). For Windows, the libraries have the ".dll" extension. For a Unix platform, they have one of the following extensions: .so for AIX, DEC-OSF1, FreeBSD, IRIX, Linux, SunOS .o for BSD-OS, .sl for HP-UX, .bundle for MacOSX. ___________________________________________________________ 5. Quick install guide (for the advanced or impatient user) Assuming you host your site on a Unix platform and you have shell access to this machine you can follow these quick instructions. If you don't understand this, read through the Installation Manual (section 6) instead. We assume that you have uploaded the matching package into your home directory, your web document root directory is "~/httpd/htdocs" and your script directory is "~/httpd/cgi-bin". (1) On the shell, go into your home directory. Unpack and install the package by entering gzip -d HSEn.n_Platform.tgz tar -xvf HSEn.n_Platform.tar cd HSEn.n_Platform mv cgi-bin/hse ~/httpd/cgi-bin/hse mv htdocs/hse ~/httpd/htdocs/hse mv tools/hse ~/hse cd ~/httpd/cgi-bin/hse chmod 755 HomepageSearchEngine.cgi.bin mv HomepageSearchEngine.cgi.bin HomepageSearchEngine.cgi After installing, you may want to remove the "HSEn.n_Platform.tar" file and the "HSEn.n_Platform" directory. (2) Open "hse.ini" and configure the 2 directives in its section 1.1 and 1.2. (3) Call the URL to HSE, http://www.yourdomain.tld/cgi-bin/hse/HomepageSearchEngine.cgi and follow the online instructions (regarding the Admin account and HTML template). (4) "Fine tune" the .ini file (descriptions are included in the file itself). (5) Test your configuration by calling HSE's URL again and searching for "list:files". ______________________ 6. Installation Manual 6.1 Executable file ("HomepageSearchEngine.exe" on Windows or "HomepageSearchEngine.cgi" on Unix) and libraries --------------------------------------------------------------------------------------------------------------- Win32 Users on IIS please read http://www.HomepageSearchEngine.com/iis_en.phtml first! Determine a target installation directory on your server. This must be your script directory (usually called "cgi-bin") or a sub directory of it (it is recommended to create a new sub directory called "hse"). This program (main) directory of HSE on the server (usually "/cgi-bin/hse") will be referred to as the "installation directory". All required files reside in the "cgi-bin/hse" sub directory of the distributed package. Upload the (executable) file "HomepageSearchEngine.exe" and all .dll files (libraries found in a Windows package) or "HomepageSearchEngine.cgi.bin" and all .so / .o / .sl / .bundle files (libraries found in a Unix package), respectively, into the target installation directory. Make sure that all these files will be uploaded in binary mode (normally you don't need to care about the mode since the file extensions should force the correct one). On Unix, rename "HomepageSearchEngine.cgi.bin" to "HomepageSearchEngine.cgi" afterwards and chmod it to 755 (rwx r-x r-x). General note for file permissions on Unix: Normally, you only have to care to set the correct permissions for the executable file. All other files should be readable by the executable without any change. But, on some server configurations, it is required to chmod all these files to 644 (rw- r-- r--). Point your webbrowser to the URL of HomepageSearchEngine's executable (the file "HomepageSearchEngine.exe" or "HomepageSearchEngine.cgi", respectively), eg. ("tld" stands for a top level domain such as "com") http://www.yourdomain.tld/cgi-bin/hse/HomepageSearchEngine.exe and you should see a message that you should upload the configuration file. 6.2 Configuration file ("hse.ini") ---------------------------------- Edit the file "hse.ini" by setting the proper values to its directives. Here is an overview of all its directives with the sections they are assigned to and the default values: directive: default value: The values in the following section are the only ones that *must* be checked and probably edited: (1) Base directory (1.1) basepath ../../ (1.2) baseurl / All following settings are optional. You may want to keep the default values for the first time. (1.3) cgiurl (2) Files ex-/including (2.1) exclude_dirs _* cgi-bin hse (2.2) ban_list /*private*/ /robots.txt /Thumbs.db .log .BAK .css .js .cgi (2.3) search_always /a_private_directory/a_public_file.html (3) Outfit of the input form (3.1) formtable_width 475 (3.2) formtable_border-color black (3.3) formtable_background-color #dadada (3.4) formtable_background-image (3.5) formtable_alignment left (3.6) formtable_input-size 38 (3.7) helpwindow_width 620 (3.8) helpwindow_height 690 (4) International settings (4.1) charset iso-8859-1 (4.2) date_format M D, Y (4.3) decimal_sep . (4.4) dir ltr (4.4) locale (system's default Locale string - only recognized by HomepageSearchEngine-lc) (5) Security tuning (5.1) debug_level 0 (5.2) max_found_files 1000 (5.3) cgi_timeout 25 (5.4) allowed_referer_sites All following settings are for advanced users and are not effective in the (free) Light edition. (7) Categories (7.1) categories_nr 1 (in the Light edition: "none") (7.2) categories_nameNR (7.3) categories_dirNR (7.4) categories_sourceNR (8) Results pages customizing (8.1) template_url (8.2) results_global search_string + options + time + summary + engine-links: 'Query the entire web using ' Google.com (8.3) results_details icon:custom16x16 + url + size + matches + update (8.4) results_descriptions 250 characters + 1 matches (40) (8.5) highlight-style background-color:yellow (8.6) target (8.7) results_href highlightmatches + gotofirstmatch + maxsize:100 (in the Light edition: "none") (8.8) results_previous_img black (8.9) results_next_img black Setting a value to a directive must follow the syntax "directive = value" and stand in *one* (own) line. Descriptions of all directives including possible values and examples are in the .ini file itself. Comments are allowed in lines beginning with the semicolon character (";"). Edit the configuration file with a text editor and save it. It can be saved in each DOS, Unix or Mac format. Then upload it into the installation directory (the same directory where the executable file resides) or another configuration directory (will be described in section 6.8 below). If you point your webbrowser to the URL of HomepageSearchEngine's executable again, you should see the first graphical screen, following you through the next steps. First, only configure section 1 - Base directory. Later, you should "fine tune" the .ini file. Especially in larger websites, it is recommended to use the categories feature (defined in section 7) to split your site into several categories each containing not more than a few MB of text. Test the categories setup by searching for "list:files" in each category without *and with* the "Search text of Non-HTML files" checkbox switched on. This will also show you that you may want to exclude some directories and files by modifying the "exclude_dirs" and "ban_list" values. 6.3 IMPORTANT - Admin Area, Creating an Admin account and the Users file ------------------------------------------------------------------------ Now a link to the Admin Area appears. This is http://www.yourdomain.tld/cgi-bin/hse/HomepageSearchEngine.exe?admin (or equivalent). You should now go to this link because at the first time this URL will be accessed the user will be asked to create a username/password pair for an Admin account. Once the first account has been created, you may login with that user data in the future to be able accessing administration tools such as a Console to run HomepageSearchEngine Shell Executable. You may also create additional Admin accounts. So make sure not to forget your Admin login data! The created username/password pairs will be stored in a text file called "hse_users.cgi". Although this users file's extension is ".cgi", it is not executable. The purpose of its "false" extension is to prevent it from being read for higher security. The passwords are encrypted using the undecryptable DES algorithm and work on both Windows and Unix platforms. The users file has the same format as the authUserFile that the .htaccess method uses for protecting directories. Therefor, you can also use the Admin Area to create such authUserFiles. If you want to disable accessing the Admin Area for security or any other reasons, just copy an *empty* file called "hse_users.cgi" into the installation directory. 6.4 Central Style Sheet definition file ("/hse/HomepageSearchEngine.css") ------------------------------------------------------------------------- Upload the .css file found in the "htdocs/hse" directory into HSE's web documents directory (/hse). This central Style Sheet definition file is required by the HTML template file described in section 6.5 and 6.6 below. 6.5 Static HTML template file ("hse_template.html") --------------------------------------------------- You should then upload "hse_template.html" found in "cgi-bin/hse" into the installation directory. After refreshing your webbrowser at the search engine's URL, the upper and lower part of the page has changed. Edit this file to fit your desired design and upload it again. In its head there is a reference to the central Style Sheet definition file mentioned above. Be sure that its URL ("/hse/HomepageSearchEngine.css" by default) points to the proper location where you have uploaded the file to. You will then see the styles that affect all elements which HSE creates on the results pages. You may want to edit the style sheet. Note that the design always keeps the same, since this template produces static HTML. This is the easiest way and may be sufficient for most webdesigners. If you are a more pretentious webmaster, you may want to use a dynamic HTML template instead (see next section). The border between the upper and lower part is marked by a line consisting of Never remove that line! 6.6 Dynamic HTML template file ------------------------------ As an alternative to the static HTML template, some require a dynamic one to be able to use SSI, PHP or any other server parsed script language. For this purpose, put a template file into HSE's web documents directory (/hse). You can name it how you want, taking care of the correct extension that is required by your server (eg. .shtml for SSI or .phtml or .php for PHP). Make sure that the border line as in the static HTML template keeps present. There is a sample SSI enabled and PHP enabled dynamic HTML template file called "hse_template.shtml" and "hse_template.phtml", respectively, in the directory "htdocs/hse" of this package. Once you have edited and uploaded your custom dynamic HTML template, you must specify its absolute URL in section (8.1) - template_url - of your .ini file, eg. template_url = http://www.yourdomain.tld/hse/hse_template.shtml To enable highest compatibility between different servers, you may drop the "http://" prefix and use something like template_url = /hse/hse_template.shtml instead. Then, the full URL will be constructed using your server's environment variable HTTP_HOST by prefixing "http://HTTP_HOST". For example, if you have installed HSE at "http://www.yourdomain.tld/cgi-bin/hse/HomepageSearchEngine.exe", HTTP_HOST equals to "www.yourdomain.tld" and the example above would resolve in the URL "http://www.yourdomain.tld/hse/hse_template.shtml". If you have installed HSE at "http://www.yourdomain.tld:81/cgi-bin/hse/HomepageSearchEngine.exe", HTTP_HOST equals to "www.yourdomain.tld:81" and the example above would resolve in the URL "http://www.yourdomain.tld:81/hse/hse_template.shtml". If the directory the dynamic HTML template resides in is password protected, you must specify login information (username and password) HSE should use to authenticate, using the syntax template_url = http://username:password@www.yourdomain.tld/hse/hse_template.shtml When your dynamic HTML template uses the "HTTP_ACCEPT_LANGUAGE" environment variable, you can set its value by delivering "lang=LANG" to the .cgi action. A more detailed description of this option follows in section 6.8. 6.7 Language files ("hse_lang.txt" and "hse_help.txt") ------------------------------------------------------ If you want the program's output in another default language than English, then also upload "hse_lang.txt" and "hse_help.txt" found in the matching language directory of the "lang" sub directory into the installation directory. The name of each language directory LANG is the 2 letter ISO 639-1 language code (eventually with an additional "-" character followed by a 2 letter-regional code) of the language it holds. These and their associated international settings for the currently 24 supported languages are: language code | language | charset | date_format | decimal_sep | dir ------------------------------------------------------------------------------------ ar | Arabic | windows-1256 | DD/M/Y | . | rtl cs | Czech | iso-8859-2 | D. M. Y | , | ltr da | Danish | iso-8859-1 | D. M Y | , | ltr de | German | iso-8859-1 | D. M Y | , | ltr el | Greek | iso-8859-7 | D M Y | , | ltr en | English | iso-8859-1 | M D, Y | . | ltr es | Spanish | iso-8859-1 | M D, Y | , | ltr fi | Finnish | iso-8859-1 | D. M Y | , | ltr fr | French | iso-8859-1 | D M Y | , | ltr hu | Hungarian | iso-8859-2 | Y. M D. | , | ltr it | Italian | iso-8859-1 | M D, Y | , | ltr ja | Japanese | shift_jis | Y.M.DD | . | ltr nl | Dutch | iso-8859-1 | D M Y | , | ltr no | Norwegian | iso-8859-1 | D. M Y | , | ltr pl | Polish | iso-8859-2 | D. M Y | , | ltr pt | Portuguese | iso-8859-1 | M D, Y | , | ltr ro | Romanian | iso-8859-2 | D M, Y | . | ltr ru | Russian | windows-1251 | D M Y | , | ltr sr | (Latin) Serbian | iso-8859-2 | D. M Y | , | ltr sv | Swedish | iso-8859-1 | D M Y | , | ltr th | Thai | tis-620 | DD/M/Y | . | ltr tr | Turkish | iso-8859-9 | D. M Y | , | ltr zh-cn | simplified Chinese | gb2312 | Y.M.DD | . | ltr zh-tw | traditional Chinese | big5 | Y.M.D | . | ltr If there are no language files for your preferred language or if you want to change the current words to fit your needs, you can edit the distributed language files. Please contact us before you want to create a new language file set if you want to get a full version of our search engine for free. 6.8 Language and Configuration sub directories / delivery parameters ("lang" and "conf") ---------------------------------------------------------------------------------------- The "cgi-bin" directory of the distributed package includes two sub directories: "lang" holds all available language directories containing the language files; and "conf" is the container for configuration directories named "1", "2", .. to "9" that can be filled with additional configuration sets. Upload them all into the installation directory to get the option to switch between languages and configuration sets. On Unix, make sure all directories are chmod'ed 755. You can then change the language and its associated international settings - as stated in the table above - by delivering the "lang" parameter with the name of the language directory (the language code) as its value to the executable. The separating character for thousands blocks will always be " " (eg. "1 679 matches") unless you deliver the lang=en paramter, resulting in changing that character to "," (eg. "1,679 matches"). The value of the "lang" delivery parameter will also be sent to the server as accepted language (as "Accept-Language" HTTP header) which results in the environment variable "HTTP_ACCEPT_LANGUAGE" set to this language code. This may be useful when a dynamic SSI enabled HTML template (see section 6.6) is used that calls a script to automatically display the date in the user's correct language format. You can try using the included dynamic HTML template file called "hse_template.shtml" to see how this works. For instance, calling http://www.yourdomain.tld/cgi-bin/hse/HomepageSearchEngine.exe?lang=de changes the language and all its associated international settings to German and sets the "HTTP_ACCEPT_LANGUAGE" environment variable to the "de" value. Similary, you can deliver a "conf" parameter with the name of a configuration directory (a number from 1 to 9) as value to the .cgi URL. For instance, calling http://www.yourdomain.tld/cgi-bin/hse/HomepageSearchEngine.exe?conf=1 forces the search engine not to use any of the default configuration files (residing in the main installation directory), but instead using those found in the directory "1". So you can use one installation of HomepageSearchEngine with up to 10 different configuration sets. Please see the WhatsThis.txt file residing in the conf/1 directory for details. NOTE: Each uploaded configuration directory must at least contain the .ini file ("hse.ini"). The distributed package contains only one configuration directory, namely "1". If you create or upload additional ones ("2" to "9") make sure that all of these directories contain an .ini file! You can disable the access to a configuration set by setting allowed_referer_sites = - in the .ini file residing in the corresponding configuration directory. That .ini file does not need to contain anything else. This may be especially useful if you only use configuration sub directories, but don't want the main configuration set to be used. Therefore, place an .ini file as mentioned above into your main installation directory. It the URL http://www.yourdomain.tld/cgi-bin/hse/HomepageSearchEngine.exe will be called, nothing appears but a message like "ERROR: Sorry, this CGI application is set not to be callable from the site 'www.yourdomain.tld'." 6.9 Setting the Language Locale - the locale-enabled HomepageSearchEngine Executable ------------------------------------------------------------------------------------ You don't need to take care about this section unless your site is in another language than English. If your site hosted on a Unix server contains words with characters other than the English A-Z characters (characters higher than US-ASCII), eg. the German "Umlaute" (Ä, Ö, Ü), you may observe the following problem (known as the "always-case-sensitive" bug): When a search string includes such characters, the search will always be performed case-sensitive, even if the 'Match case' checkbox keeps disabled (matchcase=off). Also, restricting the search to accept only whole words (noparts=on), will not work properly with words containing such characters: If the character is the first or last one of the word, the word will not be found. Searches with matchcase=on and noparts=off will always work properly. The reason for this behaviour is that the system is set to another default "Locale" (language environment) that the characters in question belong to. This could be fixed by switching the system's default Locale to the corresponding one. If this setting cannot be changed system-wide, you may HomepageSearchEngine let use its own custom Locale. Only use this feature if you are affected and case-insensitive searches for such special characters are important for you because this option requires more system resources. To test if you are affected, include a word in both lowercase and uppercase letters in one of your searchable documents, eg. "schönbrunn" and "SCHÖNBRUNN" (which contains the German "Ö" Umlaut). Then, search for this word ("schönbrunn"), keeping the restrictive search options on default case-insensitive. The result must then include the file with both occurencies found. If it only finds one instead of two occurencies, your system is affected. To solve this issue, use the "locale-enabled HomepageSearchEngine Executable" instead of the default one. It is included as file called "HomepageSearchEngine-lc.exe" (for Windows) or "HomepageSearchEngine-lc.cgi.bin" (for Unix). The latter should be renamed to "HomepageSearchEngine-lc.cgi" or "HomepageSearchEngine.cgi" once residing on the server. The Windows version is usually not needed since the "always-case-sensitive" bug seems not to affect Windows platforms. Anyway, the Windows package also contains a locale-enabled HomepageSearchEngine Executable, for testing purposes. It is a good idea to first determine your system's default Locale by calling HomepageSearchEngine-lc in Enhanced Debug mode (see section 9 for details): http://www.yourdomain.tld/cgi-bin/hse/HomepageSearchEngine-lc.exe?debug (or equivalent). The shown Locale can be changed in section (4.5) of your .ini file. To support German characters, the following setting may work: locale = German If it doesn't, find out which Locale strings are supported by the current system configuration of your host machine. 6.10 Shell Executable: creating file-lists, indexes and using other tools (Pro edition only) -------------------------------------------------------------------------------------------- Especially on large websites, you may want to speed up the search time by searching in an index instead of searching the files directly. The content of all matching HTML files will be stored in a tabstop separated text file called "hse_indexNR_html.txt". The file "hse_indexNR_nonhtml.txt" holds the content of all matching Non-HTML files. Both files represent the index file pair for category NR. If the index file *pair* for the actual category is present, it will be used, otherwise the flat or the on-the-fly search method will be applied. To create the index files, go into the installation directory on the command prompt (shell) and execute the executable file. To do this, shell access (via Telnet, SSH or direct access) is required. On Windows, you have to type something like cd F:\InetPub\www.yourdomain.tld\cgi-bin\hse HomepageSearchEngine (with or without its ".exe" extension) while on Unix, you have to type something like cd /web/www.yourdomain.tld/cgi-bin/hse ./HomepageSearchEngine.cgi If you do not have shell access, you can use the web based Console which is part of the Admin Area to execute the executable file on the shell (the executable file then behaves as the "Shell Executable"). Just point your webbrowser to http://www.yourdomain.tld/cgi-bin/hse/HomepageSearchEngine.exe?admin (or equivalent). Remember that you need to login with a username/password pair created in step 6.3 above. Executing the Shell Executable with the '-help' argument will show how it can be used: Usage: HomepageSearchEngine spider [-conf=DIR] [-cat=NR] [-lang=LANG] [-depth=LEVEL] [-max=NR_URLS] -url=URL [-debug] [-nobackup] [-batchmode] [-help] | geturls [-conf=DIR] [-cat=NR] [-lang=LANG] [-nobackup] [-batchmode] [-help] | makelist [-conf=DIR] [-cat=NR] [-nononhtml | -nohtml] [-nobackup] [-batchmode] [-help] | index [-conf=DIR] [-cat=NR] [-nononhtml | -nohtml] [-nobackup] [-nocheck] [-batchmode] [-help] | changetitles [-conf=DIR] [-cat=NR] [-nobackup] [-batchmode] [-help] | changeurls [-conf=DIR] [-cat=NR] [-nobackup] [-batchmode] [-help] | memory [-wait=SEC] Commands: spider Spiders a remote site recursively, beginning at URL down to a given LEVEL. geturls Get remote URLs and stores their content in files on your site. makelist Makes the file-list(s) required for indexing all or specific categories. index Indexes all or specific categories. changetitles Changes titles in specific index file pairs. changeurls Changes URLs in specific index file pairs. memory Shows information about the memory used by this application. Options available for the commands 'spider', 'geturls', 'makelist', 'index', 'changetitles' and 'changeurls': -conf=DIR Specifies DIR (1..9) to be used as configuration directory ./conf/DIR If not set, the main directory (the one you are currently in) will be used. -cat=NR Specifies the category number NR (1..25) to be used. With the 'spider' command, it tells which URL-list file should be created. With the 'geturls' command, it tells which URL-list file should be read. If not set, the main URL-list file (hse_urllist.csv) will be used. With the 'makelist' command, it tells which file-list file pairs should be created. With the 'index' command, it tells which index file pairs should be created. If not set, all file pairs will be created. With the 'changetitles' or 'changeurls' command, it tells which index file pair should be modified. If not set, the main index file pair (hse_index_html.txt and hse_index_nonhtml.txt) will be modified. -nobackup Does not backup a file before overwriting it. Useful when available disk space is limited to a small size. -batchmode Turns on batch mode (Does not ask any questions). -help Displays help for the given command. Without a command, displays general help (this screen). Additional Options available for the commands 'spider' and 'geturls': -lang=LANG Sends the LANG value (ISO 639 language code; eg. 'en') to the server as accepted language. Additional Options available for the command 'spider': -url=URL (Mandatory) The URL the spider should begin collecting (internal) links from. -depth=LEVEL The hierarchical LEVEL (0..10) how deep links should be followed recursively. Defaults to 3. -max=NR_URLS The maximum number of URLs NR_URLS (1..1000) to be got and checked. Defaults to 100. -debug Prints additional information (verbose mode) useful for debugging. Additional Options available for the commands 'makelist' and 'index': -nononhtml Does not create the file-list or index for Non-HTML files. -nohtml Does not create the file-list or index for HTML files. Additional Option available for the command 'index': -nocheck Does not check the index file for correct content after creating it to save system resources May be useful on systems with little available resources to avoid terminating prematurely. Options available for the command 'memory': -wait=SEC Specifies a time SEC (1..300) in seconds that the process waits before being terminated. Useful to find out memory consumption with other tools during that time. As you can see, the the Shell Executable can be called together with a command (a single word) and a number of options (always begins with the "-" character). To index a site, a file-list must first be made (using the 'makelist' command) which is then used to create the index files (using the 'index' command). Detailed information can be obtained by executing "HomepageSearchEngine index -help". The most powerful way to index your site would be if you let the index files to be created automatically every day. This could be done on Unix using the shell script "hse_cronjob.sh" or on Windows using the the batch script "hse_cronjob.bat" found in the "tools/hse" directory. Details are available in the "ReadMe.txt" file residing there. Instead of creating the index files directly on the production server, you can also create them on your local hard drive where you have mirrored the site, regardless of the platform. Just be sure to use the correct executable on your development platform. No webserver is required to be installed. Finally upload the index files via FTP onto the production server. If you cannot or don't want to create/use the index files for some reasons like limited resources, you can still improve the search speed that would result from an on-the-fly search by applying the flat search method. For this reason, only make the file-list pair by executing the 'makelist' command and skip the 'index' command. 6.11 IMPORTANT - Testing your installation: search for "list:files" ------------------------------------------------------------------- The best thing to do when installation is finished is to call the search engine in the "advanced search" form and then search for the term "list:files". You will then see which search method will be applied and which files will be searched. If the resulting list has been collected on-the-fly, you will also know how many files and directories had to be inspected, as well as the required CPU time. This may unveil unnecesarry items and the need to add some directory names to the "exclude_dirs" directive of the .ini file. REPEAT this step after all the 4 checkboxes regarding the parts of the web pages are *disabled* and the (last) checkbox "Search text of Non-HTML files" is enabled. You will then see which Non-HTML files will also be searched. You may find that unwanted files are included that have made the index file very large. Reconfigure the .ini file in this case and re-index your site again. If you have set up (more than one) categories, check all categories. Note that the "list:files" output does not work when the "debug_level" directive in your .ini file is set to a value higher than 2. So be sure that this directive keeps at its default value of 0 or is set to 1 or 2 when you want to be able to view this list. 6.12 Excluding specific sections within HTML files from being searched ---------------------------------------------------------------------- Some may have reasons to exclude certain areas within several HTML files from being searched. Put such areas between a span or div tag assigned to a "HSE-nosearch" class to force HomepageSearchEngine not to look inside these sections: This text will never be looked up by HomepageSearchEngine
This method can also be used to ban text
6.13 Options to call the search engine -------------------------------------- Since HomepageSearchEngine creates a pre-built input form automatically, there is no need to call the search engine from another form. Webmasters that want to fully use their own design may want to disable the pre-built input form by setting formtable_width = none in section (3.1) of the .ini file. Creating an own input form instead to call HomepageSearchEngine should include at least something like the following HTML code:
This will create a small text input box (with a width of 15 characters) that calls the search engine (at the location '/cgi-bin/hse/HomepageSearchEngine.exe') with the search terms entered into that box once the ENTER key has been hit. The name of that input box text field must be "terms" so that the parameter "terms" will be delivered to the search engine. The "submit" delivery parameter tells the search engine to print an error message when no search term has been submitted. If you have enabled the pre-built input form, you can force it to be shown in the Advanced form by adding this additional code within the form area: All the options you can select in that advanced input form can be pre-selected by your own form. Most people add those parameters as hidden form fields, in the same way as shown above. These delivery parameters have the following names and are pre-set to the following default values when their names are not included in the calling form: name: default value: meaning: and off do not combine all search terms with logical AND extra off do not show the input form in advanced mode (with extra options), but in simple mode matchcase off do not restrict the search to force matching case noparts off do not restrict the search to find only whole words hits 10 show maximal 10 hits per results page sort hits sort results by the number of matches (hits) If you want to change a default "off" value to "on", you must explicitly add these form fields with their new values. The same applies if you want to change the "hits" value from "10" to a number from 2 to 200. In the same way, the "sort" value can be changed to "date" or "name". The delivery parameters corresponding to the five possible search sources have the names shown in the following table. If *none* of them are specified explicitly (none of their names are included in the calling form), they are pre-set to the following values: title on search in the title parts of the web pages meta off do not search in the description- and keywords- meta tags parts of the web pages text on search in the full text of the web pages alt off do not search in the alternative texts of the images parts of the web pages nonhtml off do not search text of Non-HTML files If you want to change this pre-set combination, you must include all parameters that should be set to "on". For instance, if you want to also include Non-HTML text files additionally to titles and full text of web pages, you must include all these three parameters "title", "text" and "nonhtml" with their value set to "on". In addition to the parameters corresponding to the advanced input form, you may set a "cat" parameter with a value from 1 to 25 to specify the category to be searched in. The configuration set to be used can be specified by delivering a "conf" parameter with a value from 1 to 9; and the language settings can be specified by delivering a "lang" parameter with the language code as its value (see section 6.8 for details). If you provide a special parameter called "append", its value will be appended to the links on the results page. An example where this feature may be useful are shopping carts that use a dynamically generated ID to identify the shopper. That ID (the "append" value) will re-appear in the URL of the resulting links in the way 'URL?append=APPEND' where 'URL' is the URL to the found file and 'APPEND' is the value assigned to the "append" parameter. If you set something like results_href = /cgi-bin/hse/passurl.cgi?url=URL in section (8.7) of your .ini file, your own CGI application ("passurl.cgi" located at "/cgi-bin/hse") could then split the ID out of the delivered URL and redirect the result link to your shopping cart application, including the ID that has originally been generated by this application. Thus, the shopper's cart will not be dropped. Such a helper application "passurl.cgi" is included in the "/cgi-bin/hse" directory of the distributed package. It is a plain text Perl script. Instructions on how to use it can be found in the file itself. 6.14 Optional turn from the Trial to the Freeware version with the public key ("hse_key.cgi") --------------------------------------------------------------------------------------------- You can switch the behaviour of the search engine from the "Pro" Trial to the "Light" Freeware version by downloading http://www.homepagesearchengine.com/_download/HSE_Freeware.zip and copying the key file "hse_key.cgi.bin" found in that additional package into the directory where the executable file resides. Finally, you must rename the key file to its final name "hse_key.cgi". ____________________________________________________________________________________ 7. Special Shell Executable features and using the cronjob script (Pro edition only) As you have learned in the section how to index your site (6.10), the commands available in the Shell Executable are not limited to those used to index your site. Here you will find a description of other useful Shell Executable features. 7.1 Spidering and URL Grabbing: Searching dynamic sites or the content of any URLs ---------------------------------------------------------------------------------- This feature allows you to search your site if it consists of dynamically created pages, as it is used often with a database, eg. on web shop sites. This feature may also be useful if you have a collection of several pages located on different sites worldwide that covers one topic and you want them to be searchable in one simple step on your website. HomepageSearchEngine allows you to "grab" such content of certain URLs and store this stuff on your site to be indexable locally. The index can then be searched with the results pointing to the original URLs. It is convenient to first create a sub directory under your document root (let's assume a directory "/web_documents/_grabbed_stuff") to contain only the grabbed stuff and have this stuff assigned to an own category (eg. category nr. 1). So make sure you have set the number of categories at least to 1. Therefore, your .ini file should contain something like this: basepath = /web_documents/ baseurl = http://www.yourdomain.tld/ categories_nr = 3 categories_name1 = collection of grabbed pages categories_dir1 = _grabbed_stuff Generating the corresponding index file pair for that category 1 consists of the following 4 tasks: (1) First, you need to provide an URL-list file that holds the information which URLs to get and where to store them on your site, relatively to the directory represented by the specified category. Assuming the example above (using category 1 which represents the directory "_grabbed_stuff") you need the URL-list file called "hse_urllist1.csv" which contains remote_URL|local_relative_path lines such as http://www.HomepageSearchEngine.com/|HSE_index.html http://www.HomepageSearchEngine.com/faq_en.phtml|HSE_faq_en.html http://www.somedomain.tld/somescript.php?someparameter=somevalue|somefilename.html A sample file "hse_urllist1.csv" is included in the "cgi-bin/hse" directory of the distributed package. Since the URL grabbing results in a set of static HTML files, let them all be stored with the ".html" extension, regardless of the extension the originally file which produced the (maybe dynamically) content has had. Therefore, each line in the URL-list file should end with ".html". The URL-list file can also be created automatically using the "spider" command. This spiders one or more entire web sites or parts of them and stores all internal links to HTML files in the URL-list file, with each 'remote_URL' associated with an unique 'local_relative_path'. The spider starts at a given URL, down to a default or specified limit, similar than known from the "GNU Wget" utility. But, unlike Wget, HSE's spider only stores URLs after having verified them to hold content of a 'text/html' MIME type, including dynamic content created by .cgi files. Detailed information can be obtained by executing "HomepageSearchEngine spider -help". This command would spider the entire 'www.some_domain.tld' site: HomepageSearchEngine spider -cat=1 -lang=en -url=http://www.some_domain.tld/ -batchmode Some URLs respond with different content, depending on the "Accept-Language" value you send. If you visit the URL "http://www.HomepageSearchEngine.com/" having "en" set in your browser as your primary accepted language, you will get the English start page "index.phtml". If you have set "de" you will get the German page "index_de.phtml" instead. When using HomepageSearchEngine Executable with the "spider" command, it acts as a browser (HTTP client) and you can set your prefered accepted language using the "-lang" option. Thus, providing the option "-lang=de" instead of "-lang=en" would spider the German part of the specified site instead of the English one. (2) Once you have a URL-list file, you can grab the stuff (get the content of the URLs and store them on your site). This is handled by the "geturls" command. Detailed information can be obtained by executing "HomepageSearchEngine geturls -help". The grabbing process then takes place by entering the following command: HomepageSearchEngine geturls -cat=1 -lang=en -batchmode If you want to grab URLs only from your own site (to convert dynamic content into a searchable, static site), each remote_URL can begin with "/" instead of "http://www.yourdomain.tld/". The "/" will then be replaced by the baseurl value of your .ini file. Note that then the baseurl value must be fully qualified (beginning with 'http://'). This is convenient because the same URL-list file can be used on both the development and production server. Only the baseurl value has to be different in the two .ini files used by each server. To get rid of creating .bak backup files from each .html file that already exists (from the previous grabbing), add the "-nobackup" option: HomepageSearchEngine geturls -cat=1 -lang=en -nobackup -batchmode (3) The next step is to create the index file pair of the grabbed stuff. This requires to make the corresponding file-list file pair before. As you already know from section 6.10 above, this is handled by the "makelist" and afterwards by the "index" command: HomepageSearchEngine makelist -cat=1 -batchmode HomepageSearchEngine index -cat=1 -batchmode If the resources of the system are limited to a little amount, use these alternative commands instead to consume least possible system's resources: HomepageSearchEngine makelist -cat=1 -nononhtml -nobackup -batchmode HomepageSearchEngine makelist -cat=1 -nohtml -nobackup -batchmode HomepageSearchEngine index -cat=1 -nononhtml -nobackup -nocheck -batchmode HomepageSearchEngine index -cat=1 -nohtml -nobackup -nocheck -batchmode (4) Finally, we have to re-change the URLs in the created index to its original locations, so that the results will link to those locations. This is handled by the "changeurls" command. Detailed information can be obtained by executing "HomepageSearchEngine changeurls -help". The same URL-list file used in step 1 will be used again. The command we have to execute is HomepageSearchEngine changeurls -cat=1 -batchmode NOTE: If you don't want to use the indexed search method, but the on-the-fly method instead, you can still use this URL-list file to redirect your visitors to those changed locations. In that case, you must be sure to have set the "changeurls" key word in section (8.7) of your .ini file, eg. results_href = changeurls + highlightmatches + gotofirstmatch You can use the script "hse_cronjob.sh" (on Unix) or "hse_cronjob.bat" (on Windows) to perform those steps automatically, eg. every day at 5 o'clock in the morning. These files including a detailed instruction can be found in the "tools/hse" directory. 7.2 Searching different websites hosted on the same computer ------------------------------------------------------------ The example above allowed us to search a collection of URLs as category 1. If you have several websites accessable via different domain names hosted on the same computer, you can make all these sites searchable with one HSE installation - each site as an own category. Assuming a site accessable as "http://www.firstdomain.tld/" is hosted in the directory "/web_documents/firstdomain.tld" and another site "http://www.seconddomain.tld/" is hosted in the directory "/web_documents/seconddomain.tld", you may want to make them searchable as category 2 and category 3. In addition to the content mentioned in section 7.1 above, your .ini file should contain something like this: categories_name2 = the www.firstdomain.tld site categories_name3 = the www.seconddomain.tld site categories_dir2 = firstdomain.tld categories_dir3 = seconddomain.tld Generating the corresponding index file pair for category 2 (and also cat. 3) consists of the following 2 tasks: (1) Create the index file pair: HomepageSearchEngine makelist -cat=2 -batchmode HomepageSearchEngine index -cat=2 -batchmode This will hold all URLs beginning with "http://www.yourdomain.tld/firstdomain.tld/", but you want them all to be "http://www.firstdomain.tld/" instead. (2) Change all URLs to the real ones: HomepageSearchEngine changeurls -cat=2 -batchmode This requires a "hse_urllist2.csv" which only contains one new_URL|local_relative_path *first line* (if there are existing other lines, they will be ignored), in the format following this example: http://www.firstdomain.tld/*|* Note that in the second field (after the "|" separator) there is only an asterisk character ("*") which expresses that the local_relative_path is a wildcard for all local relative pathes. As a result, all local URLs in the index files will be changed. In this example, that are all URLs beginning with "http://www.yourdomain.tld/firstdomain.tld/". All these local start URLs will then be replaced by the new start URL "http://www.firstdomain.tld/". A sample file "hse_urllist2.csv" is included in the "cgi-bin/hse" directory of this package. ___________________________________ 8. Updating from a previous version To explore all new features, a "clean" installation is recommended - as described in section 8.12 below. The impatient may want to continue using a part of the current installation by following the update instructions below instead. Be sure to backup your current installation files before upgrading and start with the instructions for your matching version. 8.1 Updating from v3.1 ---------------------- (1) Copy the content of "HomepageSearchEngine_header.txt" into the upper part of "hse_template.html" and the content of "HomepageSearchEngine_footer.txt" into the lower part of "hse_template.html". Then upload "hse_template.html". Repeat this step for all configuration sets. (2) Delete all _header.txt and _footer.txt files. (3) Continue with the instructions in the next section. 8.2 Updating from v3.2 ---------------------- (4) In your .ini file(s), change the directive "results_details" to "results_global" and "results_show_urls" to "results_details" and assign the preferred values to them (or leave them blank to apply the default settings). (5) If you use the key word(s) "engine:ENGINE-NAME" in your .ini file(s), change ENGINE-NAME as follows: "altavista" to "AltaVista.com" and "google" to "Google.com". Additionaly, you may want to use new ENGINE-NAMEs. (6) Continue with the instructions in the next section. 8.3 Updating from v3.21 ----------------------- (7) In your .ini file(s), change the directive "highlight-color" to "highlight-style" and assign the preferred value to it (or leave it blank to apply the default settings). See into the new .ini file to know the new syntax of this setting. (8) Continue with the instructions in the next section. 8.4 Updating from v3.3 or 3.31 ------------------------------ (9) In your .ini file(s), change the directive "chars_alignment" to "dir" and assign the preferred value to it (or leave it blank to apply the default settings). See into the new .ini file to know the new syntax of this setting. (10) If you use traditional Chinese via the "lang" delivery parameter, change its value from "zh-trad" to "zh-tw". (11) In your .ini file(s), you may need to modify the values for the "results_global" directive. (12) Continue with the instructions in the next section. 8.5 Updating from v3.32 ----------------------- (13) Upload the "HomepageSearchEngine.css" file (see section 6.4) and make sure your HTML template file points to its location (see section 6.5). (14) Continue with the instructions in the next section. 8.6 Updating from v3.33 ----------------------- (15) If you want to exclude all sub directories from being searched, set "exclude_dirs = *" in your .ini file(s). (16) Continue with the instructions in the next section. 8.7 Updating from v3.34 ----------------------- (17) Unless your users file is called "HomepageSearchEngine_users.cgi", rename all configuration files within the installation directory from "HomepageSearchEngine..." to "hse...". (18) Continue with the instructions in the next section. 8.8 Updating from v3.35 ----------------------- (19) If you use a key with the deprecated name ending with ".key" rename it to end with "_key.cgi". (20) Be sure not to miss some of the style sheet definitions used in the shipped "HomepageSearchEngine.css". (21) Continue with the instructions in the next section. 8.9 Updating from v3.36 ----------------------- (22) Replace all language files. NOTE: In some language core files, line 61 and the last 3 lines (line 70 to 73) are in English instead of the proper language. The same applies to the rear part of line 5 and to the lines 32 to 33 of some language help files. Please translate them and send us the updated file(s)! Thank you!! (23) If your users file is still called "HomepageSearchEngine_users.cgi", you now must rename all configuration files within the installation directory from "HomepageSearchEngine..." to "hse...". (24) Continue with the instructions in the next section. 8.10 Updating from v3.37 ------------------------ (25) Upload (or replace, respectively) all library files (".dll" or ".so" or ".o" or ".sl" or ".bundle", respectively) in(to) the main directory. (26) In your .ini file(s), you may need to add the "title" key word to the "results_details" directive unless you didn't change the default (empty) value. (27) If you have specified a category that represents a directory without sub directories, exclude the sub directories by setting "categories_sourceNR = -/*/" in your .ini file(s), instead of "categories_dirNR = ." (28) If you use simplified Chinese via the "lang" delivery parameter, change its value from "zh" to "zh-cn" and also change the corresponding language directory to "zh-cn". (29) Replace the executable file. (30) If you are using index files, delete them and re-index your site, using the "makelist" command before the "index" command. (31) If you are affected from the "always-case-sensitive" bug, you may alternatively use the locale-enabled HomepageSearchEngine Executable. See section 6.9 for details. (32) Continue with the instructions in the next section. 8.11 Updating from v3.4 ----------------------- (33) If you are using a custom input form that preserves the previous form settings, replace "hse_customform.js". (34) If you use Japanese via the "lang" delivery parameter, change its value from "jp" to "ja" and also change the corresponding language directory to "ja". (35) Upload (or replace, respectively) all library files unless you already did so in step (25). (36) Replace the executable file unless you already did so in step (29). (37) Updating is now completed. 8.12 Clean installation and updating from versions earlier than 3.1 ------------------------------------------------------------------- (1) Make a new "clean" installation into a new directory, eg. called "hse_new". (2) Once the new installation works fine, remove the old installation (eg. the directory "hse") and rename the new directory "hse_new" into "hse". ____________ 9. Debugging If you think the CGI application doesn't run properly you can run it in "Debug mode" which may help you to find bugs. Start the application by typing the following URL into your browser's input field: http://www.yourdomain.tld/cgi-bin/hse/HomepageSearchEngine.exe?debugmode (or equivalent). If any system messages (errors or warnings etc.) occur, they will be passed to the browser window. The Debug mode can be enhanced (to the "Enhanced Debug mode") via the start URL http://www.yourdomain.tld/cgi-bin/hse/HomepageSearchEngine.exe?debug (or equivalent). This may also help to find bad configurations. Note that this output is not available when the "debug_level" directive in your .ini file is set a value higher than the default one (0). ________________ 10. Known issues Currently, the following behaviour is known not to work properly under some circumstances. - On some rarely used systems, the highlightmatches/gotofirstmatch feature does not work due to a module incompatibility. In that case, when clicking on a results link, a message will tell you about this and how to disable this feature. - On systems with little available memory, an Error 500 ("Server Error") may occur when clicking on a result link to a larger file, when the highlightmatches/gotofirstmatch feature is enabled (which is the default). If you cannot get or configure more available memory, change section (8.7) of your .ini file as mentioned below, in the given order, until you will succeed: (1) Disable the gotofirstmatch feature by setting 'results_href = highlightmatches' (2) Decrease the value of SIZE - for instance, set 'results_href = highlightmatches + maxsize:80' (3) Disable both the highlightmatches/gotofirstmatch features at all by setting 'results_href = none' ___________ 11. To-Do's The following tasks are currently waiting on the To-Do-list to be done in the future. 11.1 Internationalisation ------------------------- + Translation of line 61 of the language core file from the following languages: Arabic, simplified Chinese, traditional Chinese, Czech, Danish, Finnish, French, Greek, Hungarian, Italian, Norwegian, Romanian, (Latin) Serbian, Swedish, Thai and Turkish + Translation of lines 70-73 of the language core file from the following languages: simplified Chinese, traditional Chinese, Czech, Danish, French, Greek, Norwegian, Swedish, Thai and Turkish + Translation of the rear part of line 5 of the language help file from the following languages: simplified Chinese, traditional Chinese, Czech, Danish, French, Greek, Norwegian, Swedish, Thai and Turkish + Translation of lines 32-33 of the language help file from the following languages: Arabic, simplified Chinese, traditional Chinese, Czech, Danish, Finnish, French, Greek, Hungarian, Italian, Norwegian, Romanian, Swedish, Thai and Turkish + Language help file for (Latin) Serbian + Hebrew translation (everybody is welcome to create it - as always, you will get a free full version) + Translations into all other not yet supported languages are always welcome, too 11.2 New features ----------------- + Support the "-depth" option for the "spider" command + The spider should be able to exclude some URLs, probably based on the 'robots.txt' file + Querying a search to user defined search engines + Support to get URLs via the HTTPS protocol by the integrated HTTP client + Extracting and searching for IPTC Infos in images + Support of PDF files + Statistical options + Support for searching UFT-8 encoded web pages + Web based configuration of the .ini file(s) + CD-ROM version for Win32 + MySQL support is planned for HSE v4.0 If you want to add some tasks to this list, we are always open to hear from you. A lot of the current features have been implemented per request of our customers. Please check the next section (12. Support) before contacting us. ___________ 12. Support If you have problems or suggestions, please first check the FrequentlyAskedQuestions at http://www.HomepageSearchEngine.com/faq_en.phtml Be sure to use the latest version available if you run into a problem or if you are looking for improved functionality. The latest version may already have solved your problem or/and implemented your requested feature. If your problem still could not be solved don't hesitate to contact us using our feedback form at http://www.homepagesearchengine.com/feedback_en.phtml#2 or write us an eMail to info@anet.at (subject=HomepageSearchEngine), containing your .ini file in question, if possible. If your server runs on a Unix platform, please be sure to install platform.cgi first and include the *full* URL to it in your message. Thank you. ___________ 13. Credits Thanks go out to (second names in alphabetical order): + Geir Juul Aslaugberg for his translation into Norwegian + Rémy Bieber for his translation into French + "David" Chang Shih Chun for his translation into Japanese + Ricardo Contreras for his translation into Spanish + Miguel Duclós for his translation into Portuguese + Emad Felemban from the Umm Al-Qura University (http://www.uqu.edu.sa) for his Arabic translation of the language core file + Abdullah Ghaze Fitaihi for his Arabic translation of the language help file + Nicola Gatta for her translation into Italian + Emanuele "lele" Goldoni for his update of the Italian translation + Jozsef Tamas Herczeg for his translation into Hungarian + Mats Ingelström for his translation into Swedish + V.J. Janak for his translation into Russian + Pryme Sinista Jinx (http://www.linuxtr.com) for his translation into Turkish + Elena N. Kharlamova for her update of the Russian translation + Markus "Koppi" Koppenberger for providing access to his MacOS X machine + Yannis Kotsis for his translation into Greek + Fabian Milos for his translation into Czech + Sanja Nesic for her translation into (Latin) Serbian + Krzysztof Palka for his translation into Polish + Xuguang Pan for his translation into simplified Chinese + Fragiskos Remoundos for his Greek translation of the language help file + Adrian Roye for his translation into Romanian + Kimura Shinsuke for his update of the Japanese translation + Frans Storr-Hansen for his translation into Danish + Ylikorkala Tapio for his translation into Finnish + Itamar Vieira for his update of the Portuguese translation + Wojciech Nowakowski for his update of the Polish translation + Teerachai Yongchaitrakul for her translation into Thai + Sander van Yperen for his Dutch translation of the language core file + Shi Jian Zhuang for his translation into traditional Chinese + Jan Zonjee for his Dutch translation of the language help file + all the people out there over the net for providing us with feedback and support _____________________________________________ 14. History of version changes ("change log") Please refer to the file "history.txt". _____________________ 15. License agreement Please refer to the file "license.txt".