The driver will now automatically reload the firmware when there are no open
channels if the firmware reports a fatal error. If the firmware reports an
error, but it was not fatal, it will leave things running and try to reload when
all channels are shut down. The driver will also halt channel processing and
reload the firmware if a channel ever failed to be created.
The thought is that if the DTE reports a non-fatal error, I cannot be certain
what the state is, and it should be reset when possible without impacting
otherwise functioning card. If there are problems, presumably all users would
hang up and the driver will then reload the firmware.
If the error is fatal, then all processing is halted to encourage everyone to
hang up. The card is probably not working at this point anyway, so there is no
point in trying to communicate with it.
Also included in this change is a compile-time selectable debug sysfs attribute
that will allow forcing an alert condition for testing the recovery.
Signed-off-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Russ Meyerriecks <rmeyerriecks@digium.com>
The interrupt handler was not schedulding the deferred processing routine when
there was packets to process. I did not test the actual master branch after
editing for checkpatch compliance. Sorry.
Signed-off-by: Shaun Ruffell <sruffell@digium.com>
If the host system sends to many packets to the DTE to process, the on-card
memory can be exhausted which will result in an out of memory alert. In commit
2ac2338247, the driver will halt all communication
with the card and request a reload if any alert is received.
Now the driver will silently drop any "burst" traffic that was sent to the
transcoder as opposed to expecting the firmware to do it. There is currently a
limit of 640 samples (80ms of audio) in flight to the firmware at any one time
allowed.
Signed-off-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Russ Meyerriecks <rmeyerriecks@digium.com>
This helps to keep the tx descriptor ring at max capacity when the system is
otherwise loaded. Now ready packets are moved from cmd_list to the transmit
descriptor ring directly in the interrupt handler and not when the deferred
function runs.
Signed-off-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Russ Meyerriecks <rmeyerriecks@digium.com>
I do not have any evidence that this made a difference, but hopefully it will
clear things up for people in the future who might be wondering why the
timestamp does not increase with the number of samples actually sent.
Signed-off-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Russ Meyerriecks <rmeyerriecks@digium.com>
The polling interval was not fast enough to keep the tx ring full on a loaded
card. This fixes a regression introduced in commits
ba05e31c8a and
354d88cd41.
Signed-off-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Russ Meyerriecks <rmeyerriecks@digium.com>
When switching to polling mode it was possible that we would mask off the
receive complete interrupt until the next timer fired. Now go ahead and handle
anything we know how to handle regardless of the current mask.
Also, no need to update the reg local anymore since it isn't used to ack any
interrupts. We now always ack all the interrupts first and inspect them all.
Signed-off-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Russ Meyerriecks <rmeyerriecks@digium.com>
re-organize calls so worker_reset() isn't called twice
(was called from xbus_disconnect() and worker_destroy())
Signed-off-by: Tzafrir Cohen <tzafrir.cohen@xorcom.com>
* Maintain a "shutting_down" flag per-xbus
* Use it to prevent xbus dereferencing (via xbus_get()/xbus_put())
during an xbus shutdown.
* Also, remove xbus from global array earlier.
Signed-off-by: Tzafrir Cohen <tzafrir.cohen@xorcom.com>
waitfor_xpds xbus sysfs file should not take an xbus refcount:
* It is called from sysfs which maintain its own device refcount.
* If put_xbus() calls xbus_destroy() than down the call chain it will
try to release an object that is held by sysfs.
* This will create a deadlock.
Signed-off-by: Tzafrir Cohen <tzafrir.cohen@xorcom.com>
1) Enabling multiple csm_encaps channel commands in a single packet.
2) Sending commands to separate channels in parallel.
This reduces the time waiting for the responses to the commands and brings in
the channel setup from 50ms to under 10ms.
Signed-off-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Russ Meyerriecks <rmeyerriecks@digium.com>
interruptible_sleep_on_timeout() has been deprecated for awhile and was finally
removed in Linux 3.15. Since interruptible_sleep_on_timeout() uses jiffies for
the delay, I assumed that each jiffy equated to 10ms given the age of the
driver.
Signed-off-by: Shaun Ruffell <sruffell@digium.com>
Acked-by: Tzafrir Cohen <tzafrir.cohen@xorcom.com>
Signed-off-by: Russ Meyerriecks <rmeyerriecks@digium.com>
Some architectures, like arm, do not automatically pull in the definitions for
kzalloc and friends. This allows DAHDI to build on those platforms.
Originally reported to the asterisk-users mailing list here
http://lists.digium.com/pipermail/asterisk-users/2014-February/282338.html
Signed-off-by: Shaun Ruffell <sruffell@digium.com>
Acked-by: Tzafrir Cohen <tzafrir.cohen@xorcom.com>
Signed-off-by: Russ Meyerriecks <rmeyerriecks@digium.com>
This patch does a couple of things. It adds a new DEBUG mode where packet
statistics are printed when channels are closed which can be used to track where
packets might be lost in the transcoding chain.
This patch will also print to the kernel log if the AN983 has detected any
errored received packets. Problems of this type are typically system problems,
like when the card is having trouble DMAing packets.
Internal-Issue-ID: DAHDI-1071
Signed-off-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Russ Meyerriecks <rmeyerriecks@digium.com>
Keeping the transmit descriptor ring shorter reduces the time it takes to send
CSM_ENCAP commands to the transcoding engine when the card is otherwise busy.
Signed-off-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Russ Meyerriecks <rmeyerriecks@digium.com>
It is only modified under the chanlock anyway.
Signed-off-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Russ Meyerriecks <rmeyerriecks@digium.com>
Simplifies the logic when polling is enabled. No need to worry about any system
factors when scheduling the default kernel timer.
Signed-off-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Russ Meyerriecks <rmeyerriecks@digium.com>
Otherwise, if there are many RTP commands queued on the command list, some of
the CSM_ENCAP packets, like ACKS, weren't being sent to the firmware within the
timeout value.
Signed-off-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Russ Meyerriecks <rmeyerriecks@digium.com>
Not only does this make it atomic when moving commands from the
waiting_for_response_list to the command_list if the descriptor is full, it will
also make the entire process of submitting a packet the packet transmission
logic atomic.
Signed-off-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Russ Meyerriecks <rmeyerriecks@digium.com>
This was more a holdover when the AN983 interface was brought over from the
voicebus driver.
Signed-off-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Russ Meyerriecks <rmeyerriecks@digium.com>
When we start the shutdown sequence for a channel, there is no need to submit
any RTP packets that are queued on the command list. Under extreme load with
many backed up RTP packets it was possible to have RTP packets submitted after
the channel shutdown process started.
Signed-off-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Russ Meyerriecks <rmeyerriecks@digium.com>
A small percentage of the total packets sent to the DTE ever wait for
completions. This will save on the need to keep the completion around in all the
packets.
Also, since we can use the presence of the completion as the flag whether we
intend to auto free, we can simplify the flags as well.
Signed-off-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Russ Meyerriecks <rmeyerriecks@digium.com>
While not required by the protocol to the DTE, this does help when debugging the
trace files.
Signed-off-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Russ Meyerriecks <rmeyerriecks@digium.com>
Furthermore, do it as soon as we know we should to prevent the ack from
potentially going out after another CSM_ENCAPS packet on another CPU.
Previously, we would not send ACKS to responses we believed we already responded
to.
Signed-off-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Russ Meyerriecks <rmeyerriecks@digium.com>
Eliminates some cases where there are duplicated packets in the capture if the
hardware descriptor ring was already full.
Signed-off-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Russ Meyerriecks <rmeyerriecks@digium.com>
Makes the channels themselves behave like the supervisor channel. This only
protects the driver in the case the commands were severly backed up, like when
there was high packet loss.
Signed-off-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Russ Meyerriecks <rmeyerriecks@digium.com>
They are sufficiently protected by the list locks. This also cleans up a case
where the tcp was unlocked after already completing it, which was corrupting the
list.
Signed-off-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Russ Meyerriecks <rmeyerriecks@digium.com>
In case we missed an alert, this will allow for rapid shutdown of Asterisk.
Signed-off-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Russ Meyerriecks <rmeyerriecks@digium.com>
Even if it is duplicated or we don't have an outbound message waiting, we should
ack it so that the firmware does not keep trying to send it to the host.
Otherwise the firmware could get into situations where it was constantly
retrying to send packets for which it did not receive our previous ACK and
exhaust memory.
I was only seeing this on platforms were packets were going missing in the
stream, increasing the probability that the driver would miss early responses.
Signed-off-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Russ Meyerriecks <rmeyerriecks@digium.com>
The kernel log will now contain reports if there are bus errors.
This is a troubleshooting aide on systems with bus issues.
Signed-off-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Russ Meyerriecks <rmeyerriecks@digium.com>
Clarifies that the semaphore was being used as a mutex. Mutexes are also more
efficient and allow better debugging checks.
Signed-off-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Russ Meyerriecks <rmeyerriecks@digium.com>
The ioctls for the debug network interface on the tc400b0 has not been used for
a long time. It is now gone.
This will also allow the sempahore set in the ioctl to be changed into a mutex
which provides enhanced debugging checks.
Signed-off-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Russ Meyerriecks <rmeyerriecks@digium.com>
When an unsolicited alert is received, we'll flag the card as halted so that
commands will not be retried. This is because often times the firmware will no
longer process any commands in this state and the driver will hold processes in
the dstate while waiting to retry the commands.
This is a debugging aide in that it simplies unloading the driver if the card /
driver is currently in a failed state.
Signed-off-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Russ Meyerriecks <rmeyerriecks@digium.com>
The read-line-multiple command was already disabled on the voicebus cards, which
use the same interface, in commit 4de462c3e0. This
does the same thing for the transcoder card and also disables the read line
command.
I've seen this change directly correlated to problems with the AN983 receiving
packets from the onboard DSP on some platforms.
Internal-Issue-ID: DAHDI-1071
Signed-off-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Russ Meyerriecks <rmeyerriecks@digium.com>
I've not seen this directly tied to any issue, but it's enabled on the voicebus
cards and so brings the wctc4xxp driver in line.
Signed-off-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Russ Meyerriecks <rmeyerriecks@digium.com>
Fixes regression from bb63d03bba (before
v2.7.0). This failed to set the PCM mask on a CAS span when
DAHDI_AUDIO_NOTIFY was not set.
As the first channel of each xbus would be enabled (for
synchronization), a single call may still have passed.
This patch sets the PCM mask on any CAS channel explicitly.
Signed-off-by: Tzafrir Cohen <tzafrir.cohen@xorcom.com>
* It's currently harmless (just re-run the pre/post XPD registrations)
* But it's cleaner this way (as with xbus_register_dahdi_device())
Signed-off-by: Tzafrir Cohen <tzafrir.cohen@xorcom.com>
XPP devices have implicit support for device registration and
unregistration. Even though it is only used for the legacy (non-hotplug)
configuration case, we still prefer to make it explicit.
This attribute would later allow a simpler implementation of the user
space (xpp-specific) tool dahdi_registration.
Signed-off-by: Tzafrir Cohen <tzafrir.cohen@xorcom.com>
In an Astribank with >= 2 PRI ports, switching from E1 to T1 at run-time
may fail at the DAHDI_CHANCONFIG ioctl on the first channel in a span,
That is, on first run of dahdi_cfg, it fails on second span, on second:
it fails on third span, etc.
The code clears the D-channel information on the DAHDI_CHANCONFIG call
for the first channel in the span.
However The code tested for the global "channo" rather than the per-span
"chanpos" to check for the first channel in the span. This the test
failed.
Signed-off-by: Tzafrir Cohen <tzafrir.cohen@xorcom.com>
This is to support users who are unable to update to the lastest CentOS 5.x.
There is no change for most users on the latest releases of their distribution.
Signed-off-by: Shaun Ruffell <sruffell@digium.com>
The timer_list member of the transcoder buffer has not been used for a long
time. Saves some space in the buffer in addition to removing any future
curiosity about what it is for.
Signed-off-by: Shaun Ruffell <sruffell@digium.com>
If the memory allocation for the new channel array fails, it would be possible
to call enable_irq() without the corresponding call to disable_irq().
Signed-off-by: Shaun Ruffell <sruffell@digium.com>
This fixes a regression introduced first in release 2.9.1 with commit
(7ce8498465 "firmware: Refactor by using build_tools/install_firmware.")
which prevents from installing the firmware in a location other than the system
root.
Bug: https://issues.asterisk.org/jira/browse/DAHLIN-337
Reported-by: Anthony Messina <amessina@messinet.com>
Signed-off-by: Tzafrir Cohen <tzafrir.cohen@xorcom.com>
[edited the commit message]
Signed-off-by: Shaun Ruffell <sruffell@digium.com>