Attention mechanism has become an almost ubiquitous model architecture in deep learning. One of its distinctive features is to compute non-negative probabilistic distribution to re-weight input representations. This work reconsiders attention weights as bidirectional coefficients instead of probabilistic measures for potential benefits in interpretability and representational capacity. After analyzing the iteration process of attention scores through backwards gradient propagation, we proposed a novel activation function, TanhMax, which possesses several favorable properties to satisfy the requirements of bidirectional attention. We conduct a battery of experiments to validate our analyses and advantages of proposed method on both text and image datasets. The results show that bidirectional attention is effective in revealing input unit’s semantics, presenting more interpretable explanations and increasing the expressive power of attention-based model.