24rd Special Call: GPU Testing and Benchmarking OPEN ACCESS COMPETITION RESULTS
We would like to thank all applicants for computation time within the Special Call: GPU Testing and Benchmarking of the 24rd Open Access Grant Competition.
A total of 87 183 nodehours of the GPU accelerated part of the Karolina supercomputer were requested. Considering productive use to the resources, the allocation committee reviewed the results of the technical evaluations and evaluation of the registered publications per project ratio.
The Allocation Commission based the allocation decisions on computational readiness. I regard of the ample available resources, the allocation committee decided to grant the allocations in full.
The Allocation Commission acknowledged high technical level of the submitted projects. From the maximum score of 5 points, the projects averaged 4, at minimum of 2 points. Eight of the projects achieved full 5 points.
Among the 18 projects, a total of 87 183 nodehours of the GPU accelerated part of the Karolina supercomputer were allocated.
The Allocation Commission decided on the allocations within the Special Call: GPU Testing and Benchmarking of the 24rd Open Access Grant Competition as follows:
Researcher: Vladimír Petrík
Project: Robust object 6D pose estimation for robotics manipulation
Allocation: 6 000 nodehours
Abstract: The majority of robots deployed in a complex real-world scenario rely on visual feedback that guides the manipulator’s actions and motions. The visual feedback is often preprocessed to extract object-centric information of the scene, such as object poses, or sizes. Although machine learning brings significant improvements to the known object pose detection accuracy, the robustness is not yet ready for continuous feedback for robot control. One of the limitations of the state-of-the-art methods is full-visibility requirements, making predictions unstable in cases of partial occlusions of the object. This is a key limitation for robotic manipulation as occlusions happen naturally during robot-object interactions. The ambition of the project is to improve the robustness of an existing state-of-the-art award-winning 6D pose estimation system by addressing the problem of partial visibility via large-scale training on appropriately synthesized data. Breakthrough progress on this challenge would allow direct integration of the object pose detector into the robot control pipeline.
Researcher: Rudolf Rosa
Project: THEaiTRE GPT2 Recycling
Allocation: 1 500 nodehours
Abstract: THEaiTRE is a research project which deals with automatically generating theatre play scripts using AI. The project is a collaboration of computational linguists with theatre experts and commemorates 100 years since the premiere of the first theatre play about robots, R.U.R. by Karel Capek. The project has already produced the first theatre play script, titled \AI: When a Robot Writes a Play\", which was from 90% generated by the THEaiTRobot 1.0 system, based on the OpenAI GPT2 language model. The play premiered in February 2021 through an online broadcast from the theatre and was viewed by thousands of spectators worldwide.We are interested in generating the theatre plays in Czech language, while the GPT2 language model is only available in English language. Therefore, for the first play, we had to use automated translation of the model outputs to obtain a Czech script. For the second play, we would like to instead generate the script directly in Czech language, for which we need a variant of the GPT2 model able to generate in Czech. We are thus trying to obtain such a model by “recycling” the model with large Czech data, using a method developed by de Vries and Nissim from University of Groningen. Generating the scripts directly in Czech should lead to a higher quality of the texts."
Researcher: Karel Ondrej
Project: Knowledge Grounding for Open-Ended Language Generation
Allocation: 3 836 nodehours
Abstract: The Covid times transferred many everyday tasks such as communication, shopping, and more into the virtual world. It led to the overloading of various support lines, where the customer had to wait tens of minutes to process a trivial query. But what if we didn't need a person to answer a simple contract question or even switch to another energy provider? In these cases, the systems that can answer a question, conduct a dialogue with the user, and summarize long documents would help.In our project, we want to draw connections between information retrieval and related fields, such as dialogue generation, question answering, and abstractive summarization. It includes employing novel approaches for current research tasks such as knowledge-grounded multi-domain summarization and knowledge-grounded dialogue generation.
Researcher: Karel Chvalovský
Project: GNNs for Reasoning Tasks
Allocation: 1 843 nodehours
Abstract: In automated reasoning and logic, one of the critical decision points is representing the input problem. There is a long history of using graph representations because they make it possible to express various relations naturally. However, stronger reasoning systems, which have a bigger expressive power, require complicated graphs with various types of nodes and edges. Recent advances in graph neural networks led to architectures that make it possible to experiment with such representations easily. Nevertheless, it is still open how to represent massive graphs that can quickly arise when we simultaneously handle many related problems. A usual bottleneck is insufficient memory of GPUs. We want to use the Karolina supercomputer to experiment with training various representations using multiple GPUs and nodes.
Researcher: Lukáš Soukup
Project: Pokrocilé rozpoznávání a vytežování rucne psaných textu s využitím neuronových sítí
Allocation: 8 000 nodehours
Abstract: The project is focused on hand-written text recognition (HTR). In the research, we will follow the state-of-the-art approaches in HTR based on transformers. We will use an internal large-scale dataset of Czech historical documents. The outcome of the research shall help with the digitization of the hand-written documents.Registration number of the project is CZ.01.1.02/0.0/0.0/20_321/0024760. Official web page of the project https://www.inkcapture.com.
Researcher: Petr Kouba
Project: Machine Learning for Molecular Dynamics Simulations
Allocation: 4 000 nodehours
Abstract: Molecular dynamics (MD) simulations allow analyzing the physical movements of biomolecules. The generated data are sequences of frames (in 100 000s) captured at a predefined time step. Each frame consists of positions of all the atoms of a protein (from 100s to 10 000s), which are simulated using a molecular mechanics force field. The analysis of such a massive amount of data is often challenging, especially for molecules with conformational heterogeneity, such as the disordered Abeta peptide and APOE, which are relevant for Alzheimer's disease (AD). Abeta peptide is the hallmark of the disease and adopts diverse conformations. APOE is believed to play an important role in Abeta clearance. Understanding the dynamic properties of both Abeta peptide and APOE is a key to determine the effects of drug candidates for potential AD treatment. The objective of this project is to apply existing machine learning tools (such as the VAMPnets neural network [1]) to analyze the MD simulations of Abeta and APOE systems and understand their dynamics.
Researcher: Martin Vastl
Project: Deep learning for symbolic regression
Allocation: 5 000 nodehours
Abstract: Symbolic regression (SR) is a technique to find a model in the form of analytic equations describing given data. Typically, it has been realized using the genetic programming (GP) method, an evolutionary algorithm that evolves solutions through a stochastic process mimicking the natural evolution [1-2]. A disadvantage of this approach is that a new model has to be created from scratch through a time-consuming evolutionary process for each data set. Moreover, no knowledge acquired in these independent runs is reused in subsequent runs. In this project, we will investigate another SR approach based on the idea that one can use transformers, designed for sequence-to-sequence learning, to translate a set of data points into a proper analytic model [3-4]. This translation-based model-producing process can be much less computationally expensive than the evolutionary GP-based one.
Researcher: Jan Pichl
Project: Alquist: Conversational Artificial Intelligence
Allocation: 2 000 nodehours
Abstract: The open domain-dialogue system Alquist aims to conduct a coherent and engaging conversation that can be considered one of the benchmarks of social intelligence. One of the critical parts of the system is the set of generative models that can handle various user utterances and generate meaningful responses. The generative models are trained on extensive conversational data with a goal to generate the most suitable response given the conversation context. Optionally, additional input information can be provided, e.g., knowledge graphs. Our research goal is to train conditioned generative models to have more control over the response of the model. We plan to use various additional inputs such as dialogue acts, topics, knowledge graphs to create a more focused generative model.
Researcher: Jan Hula
Project: SKINNER
Allocation: 1 000 nodehours
Abstract: This project is focused on training a large neural network to predict the behavior of insects. Recently, after the success of large language models, it became apparent that neural networks with hundreds of millions or even billions of parameters trained on large datasets can predict the behavior of very complicated functions. Such language models trained to predict individual words in a sentence are able to complete a whole paragraph of plausibly looking text from just one sentence. Our goal is to repeat this success in the domain of animal behavior modeling. Instead of predicting individual words, we would be predicting the behavior of a given animal based on past behavior and the environment. Concretely, we want to predict the behavior of individuals from an order of insects called Orthoptera. These insects exhibit repeating behavioral patterns and at the same time are very active which makes them a good target for testing our model. We aim to train a large neural network to predict individual steps in this time series, conditioned on the history and on the current state of the environment. The resulting model will be released and could be used by researchers studying animal behavior to, for example, detect repeating behavioral patterns. It should also serve as an inspiration for similar models trained on other species.
Researcher: Martin Kišš
Project: Universal text recognition models
Allocation: 2 000 nodehours
Abstract: Project PERO develops methods and software tools for automatic handwritten text recognition which are actively used by libraries and archives in Czech Republic to transcribe their document collections. The automatic transcriptions enable, for example, full text search of handwritten documents in existing public portals such as digitalniknihovna.cz. The project provides open source OCR package for Python, REST API for large-scale document recognition and free of charge web application (pero-ocr.fit.vutbr.cz).Large multi-GPU computation nodes in Karolina supercomputer allow us to train very large neural networks for universal text transcription – for large number of languages, handwriting scripts, printed font families and alphabets. Such universal models are able to utilize the similarities between languages and visual character representations and thus achieve higher text transcription accuracy especially for poor quality handwritten documents.
Researcher: Tomas Soucek
Project: Weakly supervised learning for video understanding
Allocation: 8 000 nodehours
Abstract: Building machines that can automatically understand complex visual inputs is one of the central problems in artificial intelligence with applications in autonomous robotics, automatic manufacturing or healthcare. The problem is difficult due to the large variability of the visual world. The recent successes are, in large part, due to a combination of learnable visual representations based on neural networks, supervised machine learning techniques and large-scale Internet image collections. The next fundamental challenge lies in developing visual representations that do not require full supervision in the form of inputs and target outputs, but are instead learnable from only weak supervision that is noisy and only partially annotated data. This project will address this challenge and will develop video representations learning from only weak annotations. More details are on http://impact.ciirc.cvut.cz/
Researcher : Tomáš Jenícek
Project: Day-night image retrieval
Allocation: 2 000 nodehours
Abstract: Image retrieval is an important and active area in computer vision. The task is to query-by-image in a large indexed collection of images, where the search is based purely on the image content. Applications include content-based browsing and search in large image collections, visual localization, image annotation, data collection for 3D reconstruction, and many others. The current state-of-the-art retrieval methods are based on Deep Neural Networks which are trained on a GPU.In our project, we address image retrieval under significant illumination changes, such as between day and night images, where the appearance changes dramatically. This is currently an active field of research because of its applicability for real-world tasks such as autonomous driving.
Researcher: Lukáš Neumann
Project: Architecture Search for Deep Learning
Allocation: 2 000 nodehours
Abstract: Thanks to deep learning techniques, Artificial Intelligence field made tremendous progress in the past years, and as a result has been successfully applied to many practical tasks in computer vision, natural language processing (NLP), speech recognition, etc.Whilst training deep networks relies on machine learning algorithms, which given training dataoptimize network parameters for a specific task, the network architecture (i.e. the networkconnectivity pattern and as well as the operations used by individual network nodes) is almost always hand-crafted by human, using trial and error approach. This also applies to the most popular network architectures such as ResNet [1] or Transformers [2], including commercially exploited network architectures like GPT-2 [3].Architecture Search methods aim to overcome this limitation, by proposing algorithms which systematically explore possible deep network architectures and automatically find the most promising architectures. The main challenge of such algorithms is the fact that the search space of all possible deep network architectures is exponentially large, which makes naïve methods such as exhaustive search impossible to use, and therefore an efficient search strategy has to be applied to quickly discard architectures which do not lead to good accuracy, and to focus only on the most promising network architectures.
Researcher: Oldrich Plchot
Project: Training neural models for speech applications
Allocation: 8 000 nodehours
Abstract: In the last several years, the whole field of speech processing has benefitted from the rapid progress of machine learning techniques which came hand in hand with continuously increasing computational power. The Speech@FIT group produced excellent results utilizing the FIT BUT computing infrastructure, but the recent shift to big data and large neural models had multiplied needed computational power and negatively impacted the group's ability to compete with peer research institutions or companies. Computational resources provided by IT4I infrastructure will be used to enable and accelerate the research on big data in key areas of speech processing such as automatic speech recognition (ASR), speaker verification (SV), Diarization, Language Identification (LID), source separation, and keyword spotting. The test benchmarks will include training of state-of-the-art models in several fields of speech processing (mostly SV, diarization, ASR).
Researcher: Jonáš Kulhánek
Project: Neural 3D Scene Representation
Allocation: 8 000 nodehours
Abstract: In contrast to classical scene representations, which are designed by humans, e.g., in the form of a 3D model, neural scene representations are learned completely from data. Recent work has shown that such representations can be very powerful, enabling highly accurate rendering of complex and intricate 3D geometry and to compute the position and orientation from which a given photo was taken (which has a wide range of applications, including self-driving cars and Augmented Reality). These two tasks (neural rendering and visual localization) are inherently related: given the position and orientation estimated by a localization algorithm, neural rendering can be used to render a virtual view from the predicted viewpoint. If the real and rendered image do not align precisely, this information can be used to refine the viewpoint prediction. Rather than training separate neural scene representations for rendering and localization, this project aims to make use of their inherent connection by training a single representation for both tasks. We expect that jointly training for both tasks will lead to a more powerful representation, which has the added benefit of reducing resource consumption (using a single instead of two representations).
Researcher: Luboš Šmídl
Project: Acoustic transformers for speech recognition
Allocation: 8 000 nodehours
Abstract: Motivated by a human brain and by how children learn new skills, deep neural networks became very powerful in solving very hard NLP tasks (such as speech recognition) while being conceptually simpler. The training of such large neural networks is possible only due to a huge amount of unlabeled data available on the Internet together with the stunning computing power of modern GPU clusters.The aim of the project is to use the power of a node with 8 A100 GPU for training acoustic transformers for speech recognition.Currently, the use of transformer technologies (T5, BERT) in the field of natural language processing is popular and challenging. Acoustic transformers are based on a similar principle - with the help of large computing power and a huge amount of untranscribed data, a representation in the latent space (embedding space) is found. This representation can then be used in the tasks of speech recognition, speaker detection, diarization, keyword detection, query by example etc.
Researcher: Radim Špetlík
Project: Weak Signal Analysis in RGB Images
Allocation: 8 000 nodehours
Abstract: Glass-reflection removal, or glass-glare removal, is a problem of significant practical importance with applications ranging from license plate reading [1] to digital cleaning of camera optics [2]. Given a single photo, the task is to remove reflections, or glares, without affecting the background. Much research in recent years has focused on instances of reflection removal constrained by specific qualitative attributes, such as requirement of full-screen reflection [6,7,8]. Glass-glare removal research has assumed constraints by development environment [3,1], or requirements of additional specialized hardware [4,5,2].We address the general problem of reflection, or glare, removal aiming at reducing heavy constrains required by previous work with focus on the whole spectra of the problem – we consider reflections of all sizes and strengths and we only require a single RGB image as an input.
Researcher: Jan Lehecka
Project: Text Transformers for NLP tasks
Allocation: 8 000 nodehours
Abstract: In the last few years, deep neural networks known as Transformers have dominated the research field of Natural Language Processing (NLP). These highly sophisticated models benefit from the combination of a huge amount of unlabeled text data available on the Internet, self-supervised training methods motivated by learning skills of a human brain and still increasing computational power of high-end GPU and TPU clusters.The main idea behind text-based Transformers is to let the model read as much text as possible during the pre-training phase in order to learn high-level language representations of individual words while paying sophisticated attention to its context. This is performed by pre-training the model on artificial tasks based on repairing corrupted or perturbed inputs. After that, the model is able to generate contextual embeddings encoding both syntax and semantics of the input text. These embeddings can be easily fine-tuned to solve a large variety of NLP tasks, such as text classification, text generation, chatbots etc.Our research is focused on three types of Transformers: (1) Encoders are suitable for text classification tasks (RoBERTa [1]), (2) Decoders are suitable for generation of text (GPT-2 [2]) and (3) Encoder-Decoder models solve text-to-text problems (T5 [3]). Our models have already scored state-of-the-art results in several Czech NLP tasks, including text classification [4], sentiment analysis [5] or post-processing of ASR output [6].