Ferragina P., Nitto I., Venturini R.
Compression Computer Science - Data Structures and Algorithms Computer Science - Information Theory General Computer Science Information Theory (cs.IT) E.4 Coding and Information Theory FOS: Computer and information sciences Analysis of Algorithms and Problem Complexity Computer Science Applications Dynamic Programming Applied Mathematics Information theory Data Structures and Algorithms (cs.DS)
In this paper we investigate the problem of partitioning an input string T in such a way that compressing individually its parts via a base-compressor C gets a compressed output that is shorter than applying C over the entire T at once. This problem was introduced in Buchsbaum et al. (Proc. of 11th ACM-SIAM Symposium on Discrete Algorithms, pp. 175-184, 2000; J. ACM 50(6):825-851, 2003) in the context of table compression, and then further elaborated and extended to strings and trees by Ferragina et al. (J. ACM 52:688-713, 2005; Proc. of 46th IEEE Symposium on Foundations of Computer Science, pp. 184-193, 2005) and Mäkinen and Navarro (Proc. of 14th Symposium on String Processing and Information Retrieval, pp. 229-241, 2007). Unfortunately, the literature offers poor solutions: namely, we know either a cubic-time algorithm for computing the optimal partition based on dynamic programming (Buchsbaum et al. in J. ACM 50(6):825-851, 2003; Giancarlo and Sciortino in Proc. of 14th Symposium on Combinatorial Pattern Matching, pp. 129-143, 2003), or few heuristics that do not guarantee any bounds on the efficacy of their computed partition (Buchsbaum et al. in Proc. of 11th ACM-SIAM Symposium on Discrete Algorithms, pp. 175-184, 2000; J. ACM 50(6):825-851, 2003), or algorithms that are efficient but work in some specific scenarios (such as the Burrows-Wheeler Transform, see e.g. Ferragina et al. in J. ACM 52:688-713, 2005; Mäkinen and Navarro in Proc. of 14th Symposium on String Processing and Information Retrieval, pp. 229-241, 2007) and achieve compression performance that might be worse than the optimal-partitioning by a Ω(log n/log log n) factor. Therefore, computing efficiently the optimal solution is still open (Buchsbaum and Giancarlo in Encyclopedia of Algorithms, pp. 939-942, 2008). In this paper we provide the first algorithm which computes in O(nlog 1+ε n) time and O(n) space, a partition of T whose compressed output is guarantee
Source: Algorithmica 61 (2011): 51–74. doi:10.1007/s00453-010-9437-6
Publisher: Springer Science + Business Media, New York , Stati Uniti d'America
@article{oai:it.cnr:prodotti:68504, title = {On optimally partitioning a text to improve its compression}, author = {Ferragina P. and Nitto I. and Venturini R.}, publisher = {Springer Science + Business Media, New York , Stati Uniti d'America}, doi = {10.1007/s00453-010-9437-6 and 10.48550/arxiv.0906.4692 and 10.1007/978-3-642-04128-0_38}, journal = {Algorithmica}, volume = {61}, pages = {51–74}, year = {2011} }