Ing. Pavel Kubalík, Ph.D.

Publikace

An In-sight into How Compression Dictionary Architecture can Affect the Overall Performance in FPGAs

Autoři
Kubalík, P.; Bartík, M.; Beneš, T.
Rok
2020
Publikováno
IEEE Access. 2020, 2020(8), 183101-183116. ISSN 2169-3536.
Typ
Článek
Anotace
This paper presents a detailed analysis of various approaches to hardware implemented compression algorithm dictionaries, including our optimized method. To obtain comprehensive and detailed results, we introduced a method for the fair comparison of programmable hardware architectures to show the benefits of our approach from the perspective of logic resources, frequency, and latency. We compared two generally used methods with our optimized method, which was found to be more suitable for maintaining the memory content via (in)valid bits in any mid-density memory structures, which are implemented in programmable hardware such as FPGAs (Field Programmable Gate Array). The benefits of our new method based on a “Distributed Memory” technique are shown on a particular example of compression dictionary but the method is also suitable for another use cases requiring a fast (re-)initialization of the used memory structures before each run of an algorithm with minimum time and logic resources consumption. The performance evaluation of the respective approaches has been made in Xilinx ISE and Xilinx Vivado toolkits for the Virtex-7 FPGA family. However the proposed approach is compatible with 99% of modern FPGAs.

Novel Partial Correlation Method Algorithm for Acquisition of GNSS Tiered Signals

Autoři
Schmidt, J.; Kubalík, P.; Borecký, J.; Svatoň, J.; Vejražka, F.
Rok
2020
Publikováno
Navigation. 2020, 2020(9), 1-18. ISSN 2161-4296.
Typ
Článek
Anotace
This paper presents a new modified Single Block Zero-Padding (mSBZP) Partial Correlation Method (PCM) Parallel Code Search (PCS) algorithm for effective acquisition of weak GNSS tiered signal using coherent processing of its secondary code (SC) component. Two problems are discussed: acquisition of primary codes with longer period using FFT blocks of limited length, and the utilization of PCS in the presence of SC bit transition. The PCM and SC bit transition forms parasitic fragments in the Cross-Ambiguity-Function (CAF) to devaluate signal detection performance. A novel analysis of this mechanism and its impact is presented. A novel mSBZP-PCM-PCS algorithm is proposed, which does not degrade the CAF. Then, the algorithm is combined with SC bit transition removal schema and sequential search to construct an estimator for weak tiered signal acquisition. The performance of the method is demonstrated by analysis and computer simulation using Galileo E1C and GPS L1C-P signals.

Design of a High-Throughput Match Search Unit for Lossless Compression Algorithms

Autoři
Kubalík, P.; Bartík, M.; Beneš, T.
Rok
2019
Publikováno
The 9th IEEE Annual Computing and Communication Workshop and Conference (CCWC). Piscataway: IEEE, 2019. p. 732-738. ISBN 9781728105543.
Typ
Stať ve sborníku
Anotace
This paper presents an attempt to combine recent research in fields of hardware- and software-based high throughput universal lossless compression algorithms and their implementations, resulting into a case study focusing on one of the most critical parts of compression algorithms – a Match Search Unit (MSU) and its parallelization. The presented FPGA design combines ideas of the LZ4 algorithm (which is derived from the most common LZ77) with the state of the art hardware architectures for lossless compression also based on LZ77. This approach might lead to a smaller, better organized or more efficient ”building block” for modern implementations of hardware driven lossless compression algorithms. The presented design focuses on optimization of the main problem of the LZ77 family, namely the construction of and searching in a compression dictionary. Particularly, we combine a Live Value Table (LVT) with multi-ported memory in order to improve the bandwidth of the dictionary and the Fibonacci hashing principle originating from LZ4 algorithm to decrease latency of the MSU and to achieve overall higher throughput rate. For the design synthesis an FPGA of the Xilinx Virtex-7 family was used.

High Throughput and Low Latency LZ4 Compressor on FPGA

Autoři
Kubalík, P.; Beneš, T.; Bartík, M.
Rok
2019
Publikováno
2019 International Conference on ReConFigurable Computing and FPGAs. Piscataway, NJ: IEEE, 2019. ISSN 2640-0472. ISBN 978-1-7281-1957-1.
Typ
Stať ve sborníku
Anotace
This paper presents an FPGA design implementing a single LZ4 lossless compression IP block, providing a throughput of 6 Gbps combined with extremely low latency, while still retaining full binary compatibility with the original LZ4 format. The best-known competitor is capable of processing up to 2 Gbps per block/engine with unknown latency. The presented design uses two key features: a low-latency 8-way match search unit and consequently a match buffer which allows encoding LZ4 sequences independently to reduce stalls in the data processing pipeline. The design was evaluated on several compression corpora with an average compression ratio of 1.7.

Ultra High Resolution Jitter Measurement Method for Ethernet Based Networks

Autoři
Hynek, K.; Kubalík, P.; Beneš, T.; Bartík, M.
Rok
2019
Publikováno
The 9th IEEE Annual Computing and Communication Workshop and Conference (CCWC). Piscataway: IEEE, 2019. p. 847-851. ISBN 9781728105543.
Typ
Stať ve sborníku
Anotace
This document presents a new approach to network jitter measurement and analysis in asynchronous data networks such as Ethernet. The developed monitoring device is capable to analyze an incoming stream speed of 1 Gb/s with the resolution up to 8 ns. The system architecture supports speeds up to 100 Gb/s networks. The presented architecture can provide several statistical functions such as measuring a network jitter by Interarrival Histograms method providing the mean value and peak-to-peak value as well. The architecture was implemented and tested on Xilinx Kintex UltraScale FPGA chip using Avnet AES-KU040-DB-G development board.

Performance Comparison of Multiple Approaches of Status Register for Medium Density Memory Suitable for Implementation of a Lossless Compression Dictionary

Autoři
Kubalík, P.; Bartík, M.; Ubik, S.; Beneš, T.
Rok
2018
Publikováno
Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. New York: ACM, 2018. p. 290. ISBN 978-1-4503-5614-5.
Typ
Stať ve sborníku
Anotace
This paper presents a performance comparison of various approaches of realization of status register suitable for maintaining (in)valid bits in mid-density memory structures implemented in Xilinx FPGAs. An example of a such structure with status register could be a dictionary for Lempel-Ziv based lossless compression algorithms where the dictionary has to be initialized before each run of the algorithm with minimum time and logic resources consumption. The performance evaluation of designs has been made in Xilinx ISE and Vivado toolkits for the Virtex-7 FPGA. This research has been partially supported by the CTU project SGS17/017/OHK3/1T/18 "Dependable and attack-resistant architectures for programmable devices" and by the project "E-infrastructure CESNET "modernization" no. CZ.02.1.01/0.0/0.0/16 013/0001797.

Proposal of a Memory Architecture for Pre and Post-Correlation coherent Processing of GNSS Signal with SoC based Acquisition Uni

Autoři
Kubalík, P.; Schmidt, J.; Svatoň, J.; Vejražka, F.
Rok
2018
Publikováno
Proceedings of the 6th Prague Embedded Systems Workshop. ČVUT v Praze, Fakulta informačních technologií, 2018. p. 21-25. ISBN 978-80-01-06456-6.
Typ
Stať ve sborníku
Anotace
This contribution describes an architecture of additional system of memories for an existing GNSS (Global Navigation Satellite Systems) signal acquisition unit in frequency domain. The unit is designed for an FPGA-based HW receiver and has three 4K FFT blocks. The receiver is based on the System on Chip (SoC) Xilinx ZYNQ platform. The proposed additional memories are used as accumulators of complex signals samples and are placed in front or after the acquisition unit. They enable to process GNSS signals of different navigation systems more effectively with limited resources

Acquisition of Modern GNSS Signals in SoC ZYNQ with its Limited Computational Resources in Frequency Domain

Autoři
Kubalík, P.; Schmidt, J.; Svatoň, J.; Vejražka, F.
Rok
2017
Publikováno
Proceedings of the 5th Prague Embedded Systems Workshop. Praha: katedra číslicového návrhu, 2017. pp. 64-66. ISBN 978-80-01-06178-7.
Typ
Stať ve sborníku
Anotace
The objective of this contribution is a design of optimal algorithms for an universal GNSS acquisition unit. The unit is designed for a FPGA-based HW receiver and is implemented in frequency domain with three 4K FFT blocks. The unit is able to acquire usual civil signals (GPS C/A, BeiDou B1, IRNSS L5/S-band, and GLONASS L1OF) directly and to acquire the Galileo E1 longer code signal with proposed improved algorithm of the partial correlation. Pre- and mainly post-correlation methods are analyzed and selected with respect to implementation on the target System on Chip (SoC) Xilinx ZYNQ platform with limited computing resources.

Design of a Residue Number System Based Linear System Solver in Hardware

Autoři
Buček, J.; Kubalík, P.; Lórencz, R.; Zahradnický, T.
Rok
2017
Publikováno
Journal of Signal Processing Systems. 2017, 87(3), 343-356. ISSN 1939-8018.
Typ
Článek
Anotace
This paper is focused on error-free solution of dense linear systems using residual arithmetic in hardware. The designed Modular System uses hardware identical Residual Processors (RP)s for solving independent systems of linear congruences and combines their solutions into the solution of the given linear system. This approach uses the residue number system which is based on the Chinese remainder theorem. In order to efficiently exploit parallel processing and cooperation of the individual components, a hardware architecture of the Modular System with several RPs is designed. In order to verify the proposed architecture, a Xilinx FPGA with a MicroBlaze processor was used. Experimental results are obtained for an evaluation FPGA board with Virtex 6. Results from implementation serve for subsequent theoretical analysis of the system performance for various linear system sizes and further improvement of the system. The proposed system can be useful as a special hardware peripheral or a part of an embedded system for solving large nonsingular systems of linear equations with integer, rational or floating-point coefficients with arbitrary precision.

Methods and Hardware achitecture for Multi-constellation GNSS signal acqusition unit in frequency domain

Autoři
Kubalík, P.; Schmidt, J.; Svatoň, J.; Vejražka, F.
Rok
2017
Publikováno
ENC2017_Programme_NonCopyright. Lausanne: The Swiss Institute of Navigation, 2017. pp. 252-261.
Typ
Stať ve sborníku
Anotace
The objective of this contribution is a design of universal GNSS acquisition unit for an FPGA-based HW receiver, which is able of direct acquisition of usual civil signals (GPS C/A, BeiDou B1, IRNSS L5/S-band, and GLONASS L1OF). Due to high complexity of calculation and requirements for latency, processing in frequency domain with parallel search in code is adopted. Optimal processing methods even for the long codes of Galileo E1 or future GPS L1C signals are analyzed. For each block of the acquisition unit, a method is selected with respect to implementation on the target System on Chip (SoC) Xilinx ZYNQ platform. The unit is intended as a HW acquisition accelerator with a minimal SW handling requirements for the developed receiver.

A Novel and Efficient Method to Initialize FPGA Embedded Memory Content in Asymptotically Constant Time

Autoři
Kubalík, P.; Bartík, M.; Ubik, S.
Rok
2016
Publikováno
ReConFig’16. Piscataway: IEEE, 2016. ISBN 978-1-5090-3707-0.
Typ
Stať ve sborníku
Anotace
This paper describes analysis and implementation of a new method for maintaining valid content of FPGA memory blocks with an asymptotically constant time synchronous clear ability, that can be useful for (re)initialization to one default value. A particular application can be for high-speed real-time LZ77 lossless compression algorithms, where a dictionary has to be (re)initialized before each run of the implemented compression algorithm. The method is based on two most widely used techniques for clearing the memory content: a linear passage of the memory and clearing each cell by writing a default value and creating a register field providing an (in)valid bit for each memory cell. Our solution combines these two techniques together with the use of FPGA distributed memory blocks implemented in LUTs (Look-Up Tables) to overcome negative features of each previous method without losing the most of positive features. Our solution provides a balance between the two previous techniques and exceeds them in speed, resources utilization and latency of (re)initialization.

Nová a efektivní metoda pro zajištení platnosti dat ve vestavných pamětech FPGA se zaměřením na kompresi IP packetů v reálném čase

Autoři
Kubalík, P.; Bartík, M.; Ubik, S.
Rok
2016
Publikováno
Počítačové Architektury & Diagnostika PAD 2016 - Sborník příspěvků. Brno: Vysoké učení technické v Brně, 2016. p. 89-92. ISBN 978-80-214-5376-0.
Typ
Stať ve sborníku
Anotace
Tento článek se zabývá novým a efektivním způsobem zajištěná platnosti dat ve vestavných pamětech uvnitř FPGA, který je vhodný pro realizaci slovníků v bezztrátových kompresních algoritmech realizovaných v hardwaru (FPGA). Klíčem k této inovaci je chytré využití vlastností LUT (Look-Up Table), které umožňuje dosáhnout menšího počtu využitých zdrojů a vyšší frekvence celého systému oproti běžně používaným způsobům realizace. Tato metoda je navržena pro vysokou propustnost a nízkou latenci, což ji činí vhodnou pro kompresi jumbo IP packetů obsahující multimediální data v reálném čase. Použitou metodu je možné aplikovat na další datové struktury, které jsou mapovány do vestavných bloků RAM v FPGA.

LZ4 Compression Algorithm on FPGA

Autoři
Kubalík, P.; Bartík, M.; Ubik, S.
Rok
2015
Publikováno
21st IEEE International Conference on Electronics, Circuits, and Systems. New York: Institute of Electrical and Electronics Engineers, 2015. p. 179-182. ISBN 978-1-4799-2451-6.
Typ
Stať ve sborníku
Anotace
This paper describes analysis and implementation of a LZ4 compression algorithm. LZ4 is derived from a standard LZ77 compression algorithm and is focused on the compression and decompression speed. The LZ4 lossless compression algorithm was analyzed regarding its suitability for hardware implementation. The first step of this research is based on software implementation of LZ4 with regard to the future hardware implementation. As a second step, a simple hardware implementation of LZ4 is evaluated for bottlenecks in the original LZ4 code. Xilinx Virtex–6 and 7–Series FPGAs are used to obtain experimental results. These results are compared to the industry competitor.

Rychlé bezztrátové kompresní algoritmy

Autoři
Kubalík, P.; Bartík, M.; Ubik, S.
Rok
2015
Publikováno
Sborník příspěvků PAD 2015. Zlín: Universita Tomáše Bati ve Zlíně, 2015, pp. 31-36. ISBN 978-80-7454-522-1.
Typ
Stať ve sborníku
Anotace
Výzkum se zabývá bezztrátovým kompresním algoritmem LZ4 (založeném na LZ77) a jeho vhodnosti pro kompresi multimediálních dat a univerzální paketovou kompresi pro sít’ové spolupráce v reálném čase v oblastech citlivých na zpoždění.

An ASIC Linear Congruence Solver Synthesized with Three Cell Libraries

Autoři
Buček, J.; Kubalík, P.; Lórencz, R.; Zahradnický, T.
Rok
2014
Publikováno
Proceedings of the 21st IEEE International Conference on Electronics Circuits and Systems. Monterey: IEEE Circuits and Systems Society, 2014. pp. 706-709. ISBN 978-1-4799-4242-8.
Typ
Stať ve sborníku
Anotace
The paper describes an ASIC implementation of a linear congruence solver, part of a parallel system for solution of linear equations, and presents synthesis results for three different standard cell libraries. The previous VHDL design was adapted to three ASIC technologies (130 nm, 110 nm, and 55 nm) from two different vendors and the synthesized results were mutually compared. The comparison results were further used to obtain a view of design properties in higher density technologies.

System Design of an FPGA Linear Solver

Autoři
Buček, J.; Kubalík, P.; Lórencz, R.; Zahradnický, T.
Rok
2014
Publikováno
Proceedings of the Work in Progress Session held in connection with the 40th EUROMICRO Conference on Software Engineering and Advanced Applications and the 17th EUROMICRO Conference on Digital System Design. Linz: Johannes Kepler University, 2014, ISBN 978-3-902457-40-0.
Typ
Stať ve sborníku
Anotace
The work is focused on design of a Modular System performing error-free solution of dense linear systems using residue arithmetic in Xilinx FPGA. The designed system shall use a set of Residual Processors (RP)s for linear system solution in Residue Number System and reconstruct the set's solution afterwards. The currently proposed system's architecture has a single RP, a large DDR memory used for data transfer in between a PC and the system, and a built-in MicroBlaze processor. Future work will focus on extending the architecture to implement the entire Modular System consisting of multiple RPs and performing the backward transformation from residue representation into the rational number set.

System on Chip Design of a Linear System Solver

Autoři
Buček, J.; Kubalík, P.; Lórencz, R.; Zahradnický, T.
Rok
2014
Publikováno
2014 International Symposium on System-on-Chip Proceedings. Piscataway: IEEE, 2014. ISBN 9781479968909.
Typ
Stať ve sborníku
Anotace
This paper is focused on hardware error-free solution of dense linear systems using residual arithmetic on a System on Chip Modular System. The designed Modular System uses Residual Processors (RP)s for solving independent linear systems in residue arithmetic and combines RP solutions into solution of the linear system. A System on Chip architecture of the Modular System with several RPs is designed, each with a large memory unit used for data transfer and storage. A Xilinx FPGA architecture with a MicroBlaze processor is used to verify the proposed architecture. The experimental results are obtained for an evaluation FPGA board with Virtex 6 and a 1GiB DDR memory and serve for further theoretical analysis of the system performance for various linear system sizes and the architecture of the system.

Comparison of FPGA and ASIC Implementation of a Linear Congruence Solver

Autoři
Buček, J.; Kubalík, P.; Lórencz, R.; Zahradnický, T.
Rok
2013
Publikováno
Proceedings of 16th Euromicro Conference on Digital System Design. Piscataway: IEEE Service Center, 2013. p. 284-287. ISBN 978-0-7695-5074-9.
Typ
Stať ve sborníku
Anotace
Residual processor (RP) is a dedicated hardware for solution of sets of linear congruences. RPs are parts of a larger modular system for error-free solution of linear equations in residue arithmetic. We present new FPGA and ASIC RP implementations, focusing mainly on their memory units being a bottleneck of the calculation and therefore determining the efficiency of the system. First, we choose an FPGA to easily test the functionality of our implementation, then we do the same in ASIC, and finally we compare both implementations together. The experimental FPGA results are obtained for Xilinx Virtex 6, while the ASIC results are obtained from Synopsys tools with a 130 nm standard cell library. Results also present a maximum matrix dimension fitting directly into the FPGA and achieved speed as a function of the dimension.

Dedicated Hardware Implementation of a Linear Congruence Solver in FPGA

Autoři
Buček, J.; Kubalík, P.; Lórencz, R.; Zahradnický, T.
Rok
2012
Publikováno
The 19th IEEE International Conference on Electronics, Circuits, and Systems, ICECS 2012. Monterey: IEEE Circuits and Systems Society, 2012. p. 689-692. ISBN 978-1-4673-1261-5.
Typ
Stať ve sborníku
Anotace
The residual processor is a dedicated hardware for solving sets of linear congruences. It is a part of the modular system for solving sets of linear equations without rounding errors using Residue Number System. We present a new FPGA implementation of the residual processor, focusing mainly on the memory unit that forms a bottleneck of the calculation, and therefore determines the effectivity of the system. FPGA has been chosen, as it allows us to optimally implement the designed architecture depending on the size of the problem. The proposed memory architecture of the modular system is implemented using the internal FPGA block RAM. Experimental results are obtained for the Xilinx Virtex 6 family. Results present the maximum matrix dimension fitting directly into the FPGA, and achieved speed as a function of the dimension.

Fault Models Usability Study for On-line Tested FPGA

Rok
2011
Publikováno
Proceedings of the 14th Euromicro Conference on Digital System Design. Los Alamitos: IEEE Computer Society Press, 2011, pp. 287-290. ISBN 978-0-7695-4494-6.
Typ
Stať ve sborníku
Anotace
FPGAs are susceptible to many environment effects that can cause soft errors (errors which can be corrected by the reconfiguration ability of the FPGA). Two different fault models are discussed and compared in this paper. The first one - Stuck-at model - is widely used in many applications and it is not limited to the FPGAs. The second one - Bit-flip model - can affect SRAM cells that are used to configure the internal routing of the FPGA and to set up the behavior of the Look-Up Tables (LUTs). The change of the LUT behavior is the only Bit-flip effect considered in this paper. A fault model analysis has been performed on small example designs in order to find the differences between the fault models. This paper discusses the relevance of using two types of models Stuck-at and Bit-flip with respect to the dependability characteristics Fault Security (FS) and Self-Testing (ST). The fault simulation using both fault models has been performed to verify the analysis

Fault-tolerant and fail-safe design based on reconfiguration

Rok
2011
Publikováno
Design and Test Technology for Dependable Systems-on-Chip. Hershey, Pennsylvania: IGI Global, 2011. p. 175-194. ISBN 978-1-60960-212-3.
Typ
Kapitola v knize
Anotace
The main aim of this chapter is to present the way, how to design fault-tolerant or fail-safe systems in programmable hardware (FPGAs) and therefore to use FPGAs in mission-critical applications, too. RAM based FPGAs are usually taken for unreliable due to high probability of transient faults (SEU) and therefore inapplicable in this area. But FPGAs can be easily reconfigured. Our aim is to utilize appropriate type of FPGA reconfiguration and to combine it with well-known methods for fail-safe and fault-tolerant design (duplex, TMR) including on-line testing methods for fault detection and then startup of the reconfiguration process. Dependability parameters' calculations based on reliability models is integral part of proposed methodology. The trade-off between the requested level of dependability characteristics of a designed system and area overhead with respect to FPGA possible faults is main property and advantage of proposed methodology.

Faults Coverage Improvement based on Fault Simulation and Partial Duplication

Rok
2010
Publikováno
Proceedings of the 13th Euromicro Conference on Digital System Design. Los Alamitos: IEEE Computer Society Press, 2010. pp. 380-386. ISBN 978-0-7695-4171-6.
Typ
Stať ve sborníku
Anotace
A method how to improve the coverage of single faults in combinational circuits is proposed. The method is based on Concurrent Error Detection, but uses a fault simulation to find Critical points - the places, where faults are difficult to detect. The partial duplication of the design with regard to these critical points is able to increase the faults coverage with a low area overhead cost. Due to higher fault coverage we can increase the dependability parameters. The proposed modification is tested on the railway station safety devices designs implemented in the FPGA.

Reliable Railway Station System based on Regular Structure implemented in FPGA

Rok
2009
Publikováno
Proc. of 12th EUROMICRO Conference on Digital System Design. Los Alamitos: IEEE Computer Society, 2009. pp. 348-354. ISBN 978-0-7695-3782-5.
Typ
Stať ve sborníku
Anotace
The method how to design a safety device of railway station efficiently and scalable is proposed. The safety device for any configuration of railway station can be built from five basic blocks. These basic blocks are connected together with universal interface. Each block is based on a finite state machine. The finite state machines are "Moore" type. Each state machine is divided into three basic parts, where each part is designed as a self-checking circuit ensuring fault detection. Our methodology is intended for final implementation in FPGA and hence SEU faults occurring in the system is assumed.