Forum Replies Created
-
AuthorPosts
-
Depending on the combination of Linux, Qt library version, and local timezone (> UTC) this problem might appear. We have modified HAAst to detect the combination of libraries+Linux issue to avoid the problem. Please upgrade to HAAst version 2.3.1.16 or later and the problem should be resolved.
in reply to: Problem starting Asterisk on SystemD+Initd system #6659Given the growing popularity of SystemD (most new Linux distros use it), as of version 2.3.1.15 HAAst changed how it interacts with system services. HAAst now uses SystemD as the default.
More specifically, if HAAst detects that a PBX’s Linux uses SystemD, then it will start and stop services using systemctl. If the PBX’s Linux does not uses SystemD, then HAAst will start and stop services using initd scripts.
If you find that this change broke your system, then the simplest solution is to create a SystemD service file for Asterisk on your system, and remove/rename the initd script for Asterisk (after disabling the Asterisk initd service). Have a look at this topic https://autocommander.aws2.ocg.ca/search/Can%5C%27t+start+Asterisk+exit+code+158/ for an example asterisk.service file.
- This reply was modified 4 years, 11 months ago by WebMaster.
Dual active contention means that both peers in the cluster are active, and one or both peers discover that the other is active as well (they are contending). Upon discovering this situation the peers automatically negotiate which should remain active, and which should demote itself.
At a high level the cause of this problem is that the (previously) standby peer thought that the other peer was dead/unresponsive and needed to take over, so it promoted itself to active. However, the real question that needs answering is why did that peer think the other was dead/unresponsive?
The most common causes are:
- HAAst Misconfiguration: If one of the peers continually cycles back and forth between active and standby then the most likely cause is peerlink misconfiguration in haast.conf (peer A can talk to peer B, but peer B can’t talk to peer A). To resolve this carefully check the settings in the peerlink stanza of haast.conf. If they look correct perform telnet connectivity tests to the HAAst port from each peer to the other, and continual ping tests (using only management IP’s) between peers and watch what happens before/after a contention.
- Network Misconfiguration: If one of the peers occasionally cycles back and forth between active and standby then the most likely cause is network misconfiguration. Again, peer A can talk to peer B, but peer B can’t talk to peer A. To resolve this ensure the network settings at the OS level are correct, and the network settings are correct in HAAst.conf (voipnic stanza). This includes checking default routes (which may change if using a shared IP), accidentally reusing an IP address already in use, etc. If they look correct perform continual ping tests (using only management IP’s) between peers and watch what happens before/after a contention.
- Peer Load/Responsiveness: If one of the peers suffers from periodic extreme load then HAAst will correctly assess its health as failing and allow the other peer to take over. To resolve this problem examine both hosts for CPU load, runaway process, high IO processes, etc. For example, the backup script included in FreePBX is poorly written and will cause very high CPU and/or IO load when it runs (causing the PBX to become unresponsive briefly). To resolve this problem identify the process(es) or device(s) causing the high load and correct their behavior (e.g.: switch to a real backup program).
- LAN/WAN Latency: In cases where peers are separated by large geographic distances the maximum latency setting in haast.conf may be set too low. On rare occasions, an overloaded or problematic LAN can cause the same problem. Although the root cause of the problem can be accommodated by increasing HAAst’s maximum latency setting, this is not always desirable. Be sure to understand the implications on detection and fail-over time (for legitimate peer failure situations). As well, if you are running the Commercial Unlimited edition of HAAst then latency is already being compensated for dynamically – so the maximum latency setting will reflect how severe the problem really is, and may warrant a general network diagnostic.
- Network Interruption: This is actually not a problem. It means there was a network outage (one node could not reach the other), and the standby node correctly promoted itself. Once network connectivity was restored, it demoted itself. If this problem occurs rarely then there is nothing you need to do – HAast is doing it’s job! If this problem occurs frequently, then you should investigate a network outage/intermittency.
This type of problem can be one of the most challenging to resolve. You will need to enable full debugging in the HAAst logs, as well as system logs, to capture the data needed to diagnose. You may need to involve your network admin, and possibly your WAN carrier. Telium will often work with clients through SSH to help identify the root cause, and suggest a resolution.
The Free Edition of PBXSync will synchronize Asterisk configuration files, but not information held in SQL databases. So your Asterisk configuration files (in /etc/asterisk) will synchronize and the nodes will work properly, but if you edit the configuration through FreePBX on one peer you will not see these changes appear in the GUI of the other peer.
To synchronize the SQL database you will need a commercial unlimited edition of PBXSync.
The Free Edition of HAAst will synchronize Asterisk configuration files, but not information held in SQL databases. So your Asterisk configuration files (in /etc/asterisk) will synchronize and the peers will work properly, but if you edit the configuration through FreePBX on one peer you will not see these changes appear in the GUI of the other peer.
To synchronize the SQL database you will need a commercial edition of HAAst.
in reply to: FreePBX cluster fail over at the same time everyday #6658FreePBX includes a backup script that can consume 100% of CPU and DISK resources. HAAst will accurately detect that Asterisk has become unresponsive (during this backup window) and will correctly initiate a failover. We recommend that you disable the automatic launch of the backup script from within FreePBX. Then you can either launch the FreePBX backup script from cron using ionice and nice to make it behave properly (better), or switch to a professional backup program (eg: Backup Exec Linux Agent).
Assuming this is not a switch issue (which you confirmed by the arp packet missing in the packet capture), we can diagnose the problem as follows:
- In one shell set a simple packet capture as follows:
tcpdump -i ethX -vvv -x arp
- ethX is the ‘physicaldevice’ setting in haast.conf which should match the ethernet adapter you are using for VoIP traffic.
[*]In another shell issue an arping from the command line (exactly as follows):
arping -U -I ethX -c 5 sharedIP
[*]ethX is the ‘physicaldevice’ key set in haast.conf which should match the ethernet adapter you are using for VoIP traffic.
[*]If ‘vlanid’ is set in haast.conf the “.haast” is appended to the ethX adapter name
[*]sharedIP is the IP address moving between peers, set in the ‘address’ key of the ‘voipnic’ stanza of haast.conf
[*]Stop the tcpdump and show the interface details using ifconfigifconfig
[*]Post the full output of command and response from all 3 steps above. If any of the information above is security sensitive (eg: a public IP is in the output) email the above to support@autocommander.aws2.ocg.ca , do not obfuscate the data.If traffic did not start to flow with the above command (and assuming there were no errors reported above), can you try a variation of the arping syntax that consistently works for you? Please post what worked for you.
If the above arping command generated any errors then we can offer a workaround. Although rare, we have seen the above command syntax fail on a Linux distro that customized its arping command. What distro and arping (from what package) are you using? And what exact syntax would allow the arping command to be successful? If we see enough need for that particular arping syntax we’ll add support directly to HAAst. As well, we can offer a workaround if that’s the only arping package available for your distro.
in reply to: Checking phone number for fraud #6654Using the telnet interface to SecAst you can issue commands to communicate with Telium’s Fraud Database. (Use the ‘help frauddb’ command to see the exact syntax. ) For example:
SecAst>frauddb check 1234567890
Note that you must have a valid maintenance agreement in place to access the Fraud Database. If your maintenance agreement has expired then SecAst will continue to operate normally but without use of the Fraud Database.
in reply to: I use fail2ban, why do I need SecAst #6653First of all, you should be aware that Fail2Ban is not a security system – it depends completely on Asterisk to say that a user attempted to register/dial without a valid account. Fail2ban has no intrusion detection, no hacking detection, no geofencing, no fraud pattern detection, etc. It is simple a tool that reads log files to determine if an IP should be banned. Digium warns users not to use Fail2Ban as a security measure; see http://forums.asterisk.org/viewtopic.php?p=159984 To underscore Digium’s point, most SIP attacks don’t even show up in the Asterisk log files, so these attackers are not stopped by fail2ban.
Fail2ban is certainly better than nothing – so if you don’t want to use SecAst (even the Free Edition of SecAst), then install fail2ban. If all you want is Asterisk log trolling then SecAst can respond to these same messages from Asterisk if you choose, just like Fail2Ban, but that is among the least significant features of SecAst. SecAst uses event information from the Asterisk AMI, data from the network interface card, SIP data (including dialing digits, rate of dialing, etc), and more to create a profile of each user/device and identify potential hacking and fraud. SecAst also uses proprietary databases of phone numbers used in fraud, known source IP addresses of telecom hackers or intrusion attempts, and all IP addresses mapped to cities/regions/countries/continents worldwide to dramatically reduce the risk of fraud or intrusion. SecAst even uses heuristic detection (like Antivirus software) to identify behavioral patterns indicative of hacking attempts, or indicative of calls being made using stolen credentials. And finally, SecAst continually monitors endpoint activities (even after registration) to protect the PBX and stop fraud.
So comparing Fail2Ban to SecAst is like comparing a screw driver to a toolbox full of tools. Many of our customers have come to SecAst from Fail2Ban after their first $100,000 bill from their ITSP. Products like FreePBX tend to give users a false sense of security by calling Fail2Ban their “security system” – because it’s not. Digium makes it quite clear that if you think Fail2Ban is a security system then you risk being hacked / defrauded.
There are several potential causes for this problem, but the most likely is that a switch somewhere between your PBX’s and your default gateway is not updating its ARP table. The ARP table associates your IP address with your MAC address, so it’s still trying to send traffic for the shared IP address to the old PBX’s MAC address (which is no longer active)
When sharing an IP you should configure HAAst to issue ‘ARP Updates’ every time the shared IP address moves. This is configured in the haast.conf file in the ‘voipnic’ stanza, with the ‘arpupdate ‘ key setting (set it to true). Once set to true, HAAst will broadcast to all switches, routers, etc. that the IP address has moved and is now associated with a new MAC address.
This setting solves the problem 99% of the time; however, your switch may be ignoring the update. This might happen for one of several reasons:
- Switch Security Lockdown: To prevent malicious ARP attacks some switches have locked ARP tables. This means that the network administrator must allow the switch to accept ARP updates for the IP in question.
- Switch Security Limits: Some switches limit the number of ARP updates to X per minute. If you are experimenting with failover you may have reached the security limit of your switch. Again, the network administrator has to allow more frequent ARP updates for that IP/MAC.
- Buggy Switch Firmware: Some (particularly old HP or cheap no-name) switches do not handle ARP updates properly. The only solution is to update the switch firmware or look for a new switch.
If you are running HAAst in a cloud/hosting data center, it is common for the data center to lock down ARP tables to prevent malicious/misbehaving clients from affecting their general network. In such cases you will have to notify the data center admin of why you need to permit ARP updates, and possibly for which MAC/IP addresses. Most commercial data centers understand high availability and will have no problem accommodating your request.
in reply to: PHP / 500 error viewing some pages #6649The problem you are experiencing is most likely due to a PHP caching/optimization program installed in your server. For example, the APC (alternative PHP caching module) has some bugs that will cache an included file and they try to include it again (resulting in a PHP redefinition error, require_once error, etc). For details of the APC bug and possible solutions check out this link: https://pantheon.io/docs/alternative-php-cache/
There are caching modules from other vendors (eg: Zend) with some similar issues. So you may also wish to disable caching of SecAst files since they don’t create much load on a server (relatively static, low volume). This is not a SecAst bug, but future versions of SecAst will try to detect the caching software and work around the issue.
in reply to: Asterisk + FreePBX 10 not shutting down on demotion #6648Seeing a status from systemctl that the Asterisk service is dead is not necessarily a problem. Systemd reports on services it starts & stops; and since HAAst does not start/stop Asterisk through systemd it’s normal to see a message like this. More specifically, the Linux distro + FreePBX distro + Asterisk combination will report different status results for individual services. So seeing a systemctl status report that something is dead/not started/started may be misleading.
If you are sure that the Asterisk service should be stopped (based on the status of the peer), then we recommend paring back your environment to a single simple PBX (strip down the layers). This will let you trace the problem down to a single cause:
- Disable HAAst layer
- Power down the remote PBX (peer). Now we’re working with only the local peer (less variables to check)
- Disable the haast service and reboot the local peer
- After reboot check if Asterisk is running
- Check the haast log to ensure it did not start
- If HAAst and Asterisk are stopped lets try operating on FreePBX directly. (Proceed to step 2)
- If you got here that means you forgot to disable automatic start of Asterisk/FreePBX as outlined in the installation guide.
- Test FreePBX layer directly
- Start FreePBX with ‘fwconsole start’.
- Were there any warnings?
- Is Asterisk running normally?
- Now stop FreePBX with ‘fwconsole stop’
- Were there any warnings?
- Is Asterisk stopped?
- If you see FreePBX errors correct them and reboot the PBX and return to step 2A
- if you don’t see any errors then HAAst is having trouble controlling FreePBX. Check the ‘distribution’ setting in the haast.conf file
- If haast.conf settings are correct test the Asterisk layer directly
- Test Asterisk layer directly
- Ensure the FreePBX service / start command is disabled as per the HAAst installation guide
- Reboot the PBX
- If asterisk is started that means you forgot to disable automatic start of Asterisk/FreePBX as outlined in the installation guide.
- Start and stop the asterisk service using ‘service’ or ‘systemctl’ commands appropriate for your distribution. Did this show an error?
- If you got to this point contact Telium support for assistance through SSH.
Most users find a FreePBX internal problem (eg: ) and upon resolution all works fine again. If you are new to FreePBX you will discover (google) that these types of problems are with FreePBX are well documented. FreePBX may fail to start, fail to stop, etc. which also blocks the Asterisk process from starting/stopping. There is nothing Telium/HAAst can do about this (i.e. FreePBX issue). But if you encounter symptoms as described in the original question then try this procedure to help diagnose the problem.
Telium can offer some suggestions on diagnosing such FreePBX issues but tracking down the cause of FreePBX error messages can be time consuming. Please note that other distributions such as xCALLY Motion do not encounter this type of problem.
in reply to: Can’t find Qt prerequisite package #6647Most Linux distributions are starting to include Qt version 5.7 (as of December 2016) so check available repos (including testing repos) first to ensure it’s not there. If you can find Qt version 5.5 or later as a package it’s best to install from your package manager. Otherwise continue on below.
If you are running Ubuntu you can check this link for a step-by-step guide to installing Qt 5.x: http://sourcedigit.com/19858-how-to-install-qt-5-6-1-on-ubuntu-16-04/
If your system runs headless (i.e. no graphical shell) then you can also modify the Qt installer to run without its GUI as described here:
http://stackoverflow.com/questions/25105269/silent-install-qt-run-installer-on-ubuntu-serverIf the above suggestions don’t work, then we recommend you download ‘Qt Creator’ directly from http://www.qt.io This package is overkill, but it does an excellent job of installing everything you need (and more) relating to Qt.
After that you should have Qt 5.7 or later installed, including other Qt dependencies listed in the installation guide.
in reply to: Asterisk still running after peer demotion #6645Asterisk should not be running after a peer has switched to the standby state. This problem usually occurs if the wrong distribution number is selected in the [asterisk] stanza of haast.conf
Please ensure that you are using the correct distribution number in haast.conf. If you change the distribution number then you must restart the computer (since some poorly written configuration generators remain in an unstable state if not shut down the way they want to).
in reply to: Uploading QueueMetrics data before peer demotes #6644You should use the Asterisk pre-stop event handler to run qloaderd (which uploads queue_log data into a MySQL database for further analysis).
Create a bash script called asterisk.stop.pre and place it in the HAAst event’s directory. In that script call qloaderd with any parameters you need.
-
AuthorPosts