Vývoj v React pro studenty IT

22. 2. 2022
15:00-18:00
Workshop v Ness Ostrava

Ryze praktický workshop pro studenty informatiky se blíží. Tentokrát na téma Vývoj v React & Spring Boot.

Program: úvod do React, Spring Boot, Web Services, REST services, vytvoření jednoduché webové aplikace s využitím React & Spring…

Workshop je zcela zdarma a na místě si vystačíte s vlastním notebookem.

Program je rozdělen na dvě odpoledne – na prvním se 22. 2. zaměříme na Vývoj v React. Novými znalostmi se budete moct nabít od 15:00 na ostravské pobočce Ness Czech na adrese 28. října 3348/65, Ostrava.

Druhé odpoledne na téma Vývoj v Spring Boot proběhne 8. 3. od 15:00 do 18:00 a pokud epidemiologická situace dovolí, pak také osobně. Pro aktuální informace sledujte naše stránky www.ness.cz

Workshopem vás provedou Senior Consultanti Vladimír Kočur a Radek Porazil.

Kapacita je omezená, v případě zájmu či jakýchkoliv dotazů nás neváhejte kontaktovat na [email protected]

Těšíme se na vás!

Poznejte výhody lokální SAP podpory

15. 2. 2022
15:00-16:30
Online

Používáte SAP a nejste spokojeni s jeho podporou?

Pokud ano, přijměte pozvání na webinář, pořádaný experty na SAP ze společnosti Ness Czech, který se uskuteční zdarma v úterý 15. 2. od 15:00 do 16:30. Budeme při něm hovořit o výhodách relokace podpory SAP systému do zemí střední a východní Evropy.

  • Tlačí na vás byznys neustále s novými požadavky, které nejste schopni rychle a flexibilně splnit?
  • Chcete se soustředit na migraci do SAP S/4HANA, ale nedostává se vám kapacit pro současnou podporu uživatelů a učení se nových věcí?
  • Máte výrobní závody rozptýleny po střední a východní Evropě?
  • Potýká se každý váš závod se svými jedinečnými požadavky, např. ze strany HR či místní legislativy a předpisů?

Pakliže jste vícekrát odpověděli ANO, náš webinář bude pro vás značným přínosem.

Ukážeme vám, že dokážeme agilně a efektivně podporovat Váš SAP tým a Vašeho byznysové uživatele.

Těšit se můžete na zajímavé hosty!

Ivan Černík ze společnosti IDC bude hovořit o tom, proč je obecně důležité mít fungující podporu na běžících systémech, o tom, co z průzkumu IDC vychází jako její největší benefit pro koncové zákazníky a co je naopak při podpoře trápí. A také na základě čeho se při výběru takovéto služby rozhodují.

Zdeněk Bejček, zástupce společnosti ABB, pohovoří o předešlé situaci podpory jejich SAP systémů v regionu CEE, o byznysových důvodech, proč je globální podpora neuspokojovala, a jak situaci řešili.

Ivo Procházka, Senior Sales Representative, a Karel Eminger, Competence Manager, doplní, jak umíme v Ness Czech podporu SAP řešit jinak a lépe.

V případě jakýchkoliv dotazů nás neváhejte kontaktovat na [email protected]

AWS Financial Services Game Day Hosted by Ness Digital Engineering

Event Details:

  • Location: Online event via Zoom
  • Date & Time: 8am – 1pm ET, Friday, February 25, 2022

About AWS Game Day

The AWS Financial Services GameDay is a hands-on learning experience designed to familiarize customers with AWS and deepen their cloud skills through real-world simulation. Participants will develop modules required to manage a sustainable and reliable investment portfolio, including:

  • Integrating real-time market prices and ESG scores from Refinitiv, an LSEG business with Amazon Kinesis Analytics
  • Testing workload resiliency and protecting against service disruptions using Gremlin and Amazon EC2 AutoScaling
  • Building financial analytics for portfolio modeling using Amazon FinSpace
  • Using machine learning for price prediction and news sentiment using Amazon Comprehend and Amazon Forecast

About Ness Digital Engineering

Ness Digital Engineering is an AWS Premier Consulting Partner with AWS Financial Services, Migration, and DevOps Competencies. We combine business domain knowledge, technology expertise, and a disciplined process to ensure the success of the most challenging projects in the industry. We’re uniquely positioned to help clients at any stage of their cloud journey with AWS and are proud of our 300+ AWS Certifications. All GameDay presenters are AWS Certified Professionals with years of experience in designing, architecting, and delivering complex cloud projects

End-to-End Development service for threat management security

Case Study

End-to-End Development Service for Threat Management Security

The Challenge

The client, a threat management security solution for web applications & cloud environments, is looking for an end-to-end development partner to improve their solution.

The Solution

First Ness performed an analysis of the system in order to determine how to call 3rd party service API and set the desired object list. In the development stage, a collector template was created and testing was automated. We then created a collector using nodejs tech stack and deployed this collector package on S3 bucket. By Using AWS CloudFormation Template (CFT) invoked S3 bucket package and pulled logs and pushed them into the console.

The Results

  • Blocks ransomware and variants of malware as they arrive in phishing emails.
  • Block execution when opened as an attachment.
  • Thwarts multiple attack techniques that try to compromise endpoints, gain access to resources, and detonate payloads.
  • Achieves multi-vector attack monitoring and isolation that recognizes techniques and stops them early before any damage is done.
  • Works alongside existing anti-virus tools to provide an additional layer of defense.

Building a Solution to Identify Host Information using Probes

Case Study

Building a Solution to Identify Host Information using Probes

The Challenge

The client, an AI-based cybersecurity product company funded by Comcast, needed a solution developed which allowed its product to identify host information on the basis of IP addresses.

The Solution

Ness developed a solution to identify host info using probes. Probes are capable to query remote systems and get all the details. By developing probes (AD Probe & Network Probe) for a larger environment, the product can collect host info from several million computers by traversing multiple active directories.

The Results

  • Now able to quickly identify and replay anomalous network behaviors that represent movement by infected hosts or malicious insiders seeking to exfiltrate proprietary data.
  • Ability to construct a 360-degree view of the entire cyber threat kill chain, enabling customers to detect, analyze, and contain any threats originating from outside or inside the network.
  • Examine more than 4,000 network protocols for potential malicious events and performs machine learning, network-based forensic detection, speculative code execution, and behavioral analysis on all communications.

Ness Develops SD-WAN Solution for World Leader of Telecommunications

Case Study

Ness Develops SD-WAN Solution for World Leader of Telecommunications

The Challenge

The client, a US-based market leader in secure real-time communications solutions for the cloud, network, and enterprise edge, was looking to deliver an encrypted tunnel architecture over their existing WAN. Allowing them to provide voice, video, and business application performance over multiple WAN connections as well as configure application-specific routing, multi-link performance, and stateful SIP transfer.

The Solution

Ness developed an SD-WAN solution that helps prioritize UC traffic as well as developed an ML-based app classification to prioritize the traffic dynamically. This allowed for an engineered business continuity plan which places LTE radio as the secondary connection during an outage.

The Results

  • Edge integration with analytics in the cloud allows dynamic traffic routing without the need for h/w or s/w shipment
  • 8X bandwidth savings and superior customer experience
  • Significant reduction in operating costs
  • Improved security and quality of service

Cryptocurrencies Analysis and Return Forecasting

Introduction

Since 2008 bitcoin has gained a prominent place in the international financial landscape, and conversations about cryptocurrencies have become a frequent subject in social media, attracting more investors every day.

The Cryptocurrencies market is still considered one of the most controversial subjects in the financial sector. It was created based on decentralized trust without any central authority and no government or central bank regulates its value like national currencies causing some organizations to fear dealing in cryptocurrencies, assuming they threaten the traditional economic system. Some see cryptocurrencies as a solution for the lack of confidence in the financial system, and the success of cryptocurrencies market is undeniable.

Cryptocurrencies can be subject to the same types of analysis as national currencies, and conclusions about correlation of similar instruments reached. This study examines the data analysis and suggests methods for forecasting of the weighted average return for two of the major cryptocurrencies: Bitcoin and Ethereum. The analysis is done using a minute-by-minute prices of high-frequency Bitcoin and Ethereum market data dating back to 2018 from Kaggle (1).

The most challenging part about predicting the return of cryptocurrencies is the high volatility of the data since crypto market is still at a very nascent stage compared to other investment tools and currencies.
Lack of a clear pattern that can be detected by the human eye in the data makes the model highly prone to overfitting the training set.

Why predict the return not the close price itself?

Most financial studies involve returns, instead of prices, of assets.
Campbell, Lo, and MacKinlay (1997) gave two main reasons for using returns (2)

  1. First, for average investors, return of an asset is a complete and scale-free summary of the investment opportunity.
  2. Second, return series are easier to handle than price series because the former have more attractive statistical properties (e.g., stationarity).

Econometrics and Preliminary Data Analysis

Here we will perform a preliminary data analysis to get a better understanding of the dataset.

This dataset contains stock prices information on historic trades for several crypto assets. In this post, the focus will be on two of the major crypto currencies in the market: Bitcoin and Ethereum.

Dataset Attributes

  • Timestamp
  • Asset ID
  • Stock Prices: Open, Low, High, and Close Prices
  • Count: The number of trades that took place this minute
  • Volume: the number of crypto assets unites traded during the minute
  • Volumen weighted average price for the minute
  • Target: 15 minute residualized return

Target Value

Return: definition and equation

The return provided in the dataset is a near future return for the prices

\(Target^2(t) = R^a(t) - \beta^aM(t)\)

\(R^a(t) = log(P^a(t+16)/P^a(t+1))\)

\(M(t) = \frac{\sum_a\omega^aR^a(t)}{\sum_a\omega^a}\) \(\beta^a = \frac{\langle M \cdot R^a\rangle}{\langle M^2 \rangle}\)
Where: \(R^a_{(t)}\) is the log price for asset (a) over 15 minutes, w\(^a\) is the asset weight, and \(M_{(t)}\) is the weighted average market returns

Visualizing the data

The return provided in the dataset is a near future return for the prices

Ethereum

Note the high volatility of the data that is reflected in the return time series with some values that look like outliers (e.g., Bitcoin high return around October 2019).

To prove or disprove that point, let us take a closer look at the signal around that time

The following diagram focuses on Bitcoin over the period between Oct 2019 and Nov 2019.

After zooming in the bitcoin visualization, we note that the sudden shock of the return series is not an anomaly since it is not a single isolated outlier. The shock in the graph is caused by the price surge in the market that took place around the end of October 2019.

Usually, some noise in the dataset could be used as a regularization technique as it smooths the final model, and helps it generalize better and avoid overfitting. With this high volatile dataset, the noise would not do the training any good, and it might even cause the model to overfit the data, since the data will be too complex with the added noise., For that reason, the noise and outliers need to be removed from the training dataset.

“Obviously, if your training data is full of errors, outliers, and noise (e.g., due to poor- quality measurements), it will make it harder for the system to detect the underlying patterns, so your system is less likely to perform well” (2)

Correlation between Cryptocurrencies

To get more insight into the linear relation between the two cryptocurrencies, let us visualize them on the same diagram.

Note that due to the difference in price scale between different cryptocurrencies, we have standardized the two time series to get a more meaningful visualization.

Note the high but variable correlation between the assets. Here we can see that there is some changing dynamics over time, which is critical and important to note while performing forecasts.

The incredible rise in the cryptocurrencies market– in this example: bitcoin and Ethereum – is around March 2021 which illustrates the impact of COVID-19 related media on the crypto market.

Return distribution

The distribution is closer to a normal distribution in terms of centralizing around the mean, but with a very sharp peak and wide base

Taking a closer look at the distribution of bitcoin’s return, note that the less frequent values are widely spread around the mean

Price Distribution

Close price distribution of the two assets (Bitcoin on the left, Ethereum on the right)

Note that the close price distribution is positively skewed to the left caused by the positive values in the price resulting in a low mean and a high variance.

In the Bitcoin price distribution, we recognize four distinct peaks (local maxima) in the probability density. The distribution looks like a multimodal distribution (3) – quadratic – consisting of four modes, the one on the left side of the bitcoin figure represents the major mode (Acrophase), while others are the minor modes.

Extreme Returns

To study the extreme values in the return – target of forecasting – we need to analyze the data using two major econometrics: Excess Kurtosis and Skewness.

These statistical tools better represent the extremes of the data set rather than focusing solely on the average.

Positive extreme returns are critical to holding a short position. It is important as a short seller to know when the positive returns are extreme, so, when the forecast shows a regression to the mean this would allow the short seller to invest in the right time by being able to identify the peak and knowing in advance – through the return forecasting – that a reversion to the mean is expected. Whereas the negative extreme returns are important in risk management since it is important to know when the negative returns are extreme and there will again be a reversion to the mean and to positive returns.

Excess Kurtosis

The excess kurtosis of a normal random variable is zero. A distribution with positive excess kurtosis is said to have heavy tails – which is the case with the price distribution of the two cryptocurrencies – implying that the distribution puts more mass on the tails of its support than a normal distribution does.

In practice, this means that a random sample from such a distribution tends to contain more extreme values. Such a distribution is said to be leptokurtic

Sample kurtosis can be calculated through the following equation

\(\hat{K}(x) = \frac{1}{(T-1)\hat{\sigma}^4_x}\sum\limits_{t=1}^T(x_t-\hat{\mu}_x)^4\cdot\)
T is the number of observations
Bitcoin and Ethereum return Kurtosis respectively

Bitcoin Return Kurtosis Ethereum Return Kurtosis
65.83791065718604 75.85892119668719

From the output, we could see that the excess kurtosis is quite high – leptokurtic – for both cryptocurrencies compared to the kurtosis of a normal distribution, which means that we have a lot of extreme values

Skewness in Return

Skewness in the return distribution can be visualized through the asymmetry that deviates from the symmetrical bell curve where the data piles to the left of the curve (positive) skewness or to the right (negative)

From the visualization, the return looks almost symmetrical; but to get more precise numbers, and to find if the data is skewed to the right or the left, we calculated the skewness for both Bitcoin and Ethereum returns

Sample skewness can be calculated through the following equation

\(\hat{S}(x) = \frac{1}{(T-1)\hat{\sigma}_x^3}\sum\limits_{t=1}^T(x_t - \hat{\mu}_x)^3\)

T is the number of observations, û is the mean of the distribution, and sigma is the standard deviation

Bitcoin and Ethereum Skewness respectively

Bitcoin Return Kurtosis Ethereum Return Kurtosis
1.4296588537329864 0.6970627924698324

Notice that both returns have a positive skewness – Bitcoin skewness is almost three times larger than Ethereum skewness – which means that we have few large gains and frequent small loses.

These highly frequent extreme values raise the bar for any ML forecasting model.

Stationarity

A stationary process has the property of non-changing mean, variance, and autocorrelation structure over time (4).

A time series is said to be strictly stationarity if the unconditional joint probability distribution does not change when shifted in time. This is a very strong condition that is hard to verify empirically.
A weaker version of stationarity is often assumed. The weak stationarity implies that the time plot of the data would show that the values fluctuate with constant variation around a fixed level.

In applications, weak stationarity enables one to make inference concerning future observations (e.g., prediction) (2).

Stationarity is important because many useful analytical tools and statistical tests rely on it.

Stationarity Test

In finance literature, it is common to assume that an asset return series is weakly stationary. While price series of an asset tend to be nonstationary. The non-stationarity is mainly due to the fact that there is no fixed level for the price (2).

We will check empirically both assumptions by using ADF test.

Augmented Dickey-Fuller ADF Test – Unit-Root Test

In econometrics and statistics, ADF is used to test if a unit-root nonstationarity is present in a given time series. Therefore, it detects whether time series is stationary with a certain level of confidence (5).

After applying ADF on the target return series we found out that the unit-root hypothesis is rejected (5), and the series is stationary. For the close price time series, the unit-root hypothesis could not be rejected, and the series is nonstationary. That can be seen by looking at the expanding mean and standard deviation of the close price of Bitcoin.

Note how the expanding standard deviation had made a decent jump around the beginning of 2021

Conclusion: we have a nonstationary price time series and a stationary target return. This conclusion will help us in the future work when applying econometric models

Trends and Seasonality

Now, let us take a closer look into the closing price time series by decomposing it into three main signals:

  • Trend: A pattern in data that shows the movement of a series. Increasing or decreasing slope in the time series
  • Seasonality Component: Explains periodic ups and downs
  • Residuals: what is left over after fitting the model of the trend and seasonal components

Decomposition is done using the seasonal decomposition functionality in the stats models API which requires specifying the model:

The additive model is Y[t] = T[t] + S[t] + e[t]
The multiplicative model is Y[t] = T[t] * S[t] * e[t]

The results are obtained by first estimating the trend by applying a convolution filter to the data. The trend is then removed from the series and the average of this de-trended series for each period is the returned seasonal component (6).
The visualization here represents the annual decomposition for Bitcoin close price time series:

Note that the results are related to the chosen model.

Several observations can be made from the data. The upward trend has accelerated since 2021.

We have a remarkably high residual, which means that there are many points that are not captured by trend and seasonal components using the multiplicative model, as a result, the multiplicative model is not the best one to fit the data. There are other advanced decompositions like Seasonal and Trend decomposition using STL decomposition (7) that is worth exploring (outside the scope of this post).

Machine Learning Models and Forecasting

It is time to build a model to predict the return for the different cryptocurrencies.

Feature Engineering and Data Preparation

The features set is composed of the stock prices (open, low, high, and close), volume, and count. Data has been cleaned, gaps and missing values were imputed, and values were standardized. Primarily, I partitioned into three subsets based on the famous role %50-%25-%25 for training, validation, and testing, respectively. While experimenting, I changed the percentage of each share – basically I increased the training subset share, since the first roughly 50% of the data does not seem to be representative for the data, and that does not do the training any good.

Data partitions

Neural Network

Let us classify algorithms we could use for forecasting the target into three wide categories:

  • Deep learning models
  • Econometric models
  • Other regressors that do not fall under the first categories

This post is focused on exploring the first approach which is deep learning, and Recurrent Neural Network and LSTM in particular. Recurrent Neural Network is a class of nets that can predict the future. They can analyze time series data such as stock prices and tell when to buy and sell (8)

Image source (9)

In general, when training deep neural network, it might suffer from unstable gradient issue – exploding or vanishing – which can be solved by using multiple techniques like dropout, normalization layers, good initialization.

In the backpropagation phase, the gradient of the loss function is computed for each layer starting from the last layer up to the input layer, and the weights of each layer get updated based on the gradient error. When using a big learning rate, the gradients can grow bigger and bigger, and as a result the weights too, which would cause the algorithm to diverge. To avoid that, we need to use a small learning rate or a good learning schedule. Also using a saturating activation function also helps to alleviate that problem.

The recurrent structure of RNN creates some sort of a simple memory since the output at a specific point of time is a function of the inputs from previous time steps. Each neuron in RNN represents a simple memory cell. This memory cell has a limitation in the length of the time series it can remember. This limitation can be addressed by using Long Short Term Memory LSTM cells in RNN layers, which help the network to more accurately learn long series compared to pure RNN cells.

When forecasting time series, it is common to remove the trend and seasonality first from the series before the training phases, then add them to the predictions. While this is not a required step for RNN, this procedure improves the predictive performance in some cases since the model does not have to learn the trend and seasonality too. (8)

LSTM has feedback connections. It can process not only single data points (such as images), but also entire sequences of data. A common LSTM unit is composed of a cell, an input gate, an output gate and a forget gate. The cell remembers values over arbitrary time intervals and the three gates regulate the flow of information into and out of the cell(10).

LSTM can learn to recognize important input and memorize it.

Implementation

Loss Function

The loss function is extremely critical when training the model, and it should be used based on the data fed to the model. For example, in training a single model for a set of cryptocurrencies it is worth considering loss functions other than MSE, especially if the target value range is asset-based, because you do not want to penalize the model as heavily as mean square error does for each error (11).

Functions I experimented with the following functions:

  • Mean Squared Error
  • Mean Squared Logarithmic Error
  • Mean Absolute Error Loss
  • Writing custom loss function based on correlation as a metric

Activation Function

Hyperbolic Tangent function, which is a saturating activation function that helps solve the exploding gradients problem. I experimented with other activation functions, but the Tangent function gave the best results.

Number of Layers and Neurons in each Layer

For the number of layers and neurons, I followed the “stretch pants” approach by Vincent Vanhoucke – a scientist at Google (8) . Go with a bigger network with more layers and neurons than needed and use early stopping & plenty of regularization to prevent the neural network from overfitting.

Hyperparameter turning

I used RandomizedSearchCV, which randomly passes the set of hyperparameters and calculates the score and gives the best set of hyperparameters which gives the best score as an output.

Parameter distributions explored in the search included

  • Activation function
  • Learning rate
  • Loss function
  • Optimization function
  • Dropout rate

Hyper parameters used after optimization

Future Work

That was the setup, and in future work I will go into more details about the model training, validating, and testing.

What makes the process of building the model for forecasting challenging is that the crypto market is still nascent and emotional – which is the case of the stock market in general – and news plays a key role in defining the new prices for each asset, and since the news is hard to predict, then the crypto prices are hard to predict too.

Future Work

The RNN model I created is still in development, while it is being improved, it is worth trying other custom RNN algorithms for time series prediction in parallel like Amazon SageMaker DeepAr. Besides, I plan to do a deep dive into econometric models used for time series forecasting.

There is still a lot to do with data analytics of cryptocurrency market. I am planning to do an advance analysis to study the correlation between every pair of cryptocurrencies and take a closer look into the correlation between the crypto market and the media in terms of nature of the media news itself.

Due to the high correlation mentioned above between the news and the crypto market prices, in a future time series related work – in the feature engineering phase I will include news/media. That will be done by applying sentiment analysis to the news and extracting positive/negative signal with a confidence score. Such a process could be achieved by using a pre-trained model to perform the sentiment analysis or use an off-the-shelf service to do that like Amazon Comprehend service.

For navigating the hyper-parameter space, training, testing, and validating the model, I am planning to experiment in a more efficient and integrated cloud machine-learning platform like AWS SageMaker and utilize services like Autopilot to automatically train and tune the best machine learning models while maintaining full control and visibility.

As for datasets, I will experiment with other financial time series datasets.

Noor Alsabahi, Lead Data Engineer

References

test