Data transformation, data conversion and custom tool development for Pori-75 study
Bergman, Jussi (2021)
Bergman, Jussi
2021
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2021111620342
https://urn.fi/URN:NBN:fi:amk-2021111620342
Tiivistelmä
Objective : To describe the automated dimension engineering process developed specifically for Pori-75 data by using different data conversion methods and measurement scale transformations. The first objective was to prepare the Pori-75 data for effective use of Data-analytic(DA) and Artificial Intelligence(AI) tools. The second objective was to describe the developed transformation tools for blood result categorisation, and for ATC-coding of medication lists. Third objective was to develop a automatic data selection tool and a fast hypotheses testing tool specifically for Pori-75 data.
Methods : Data conversions and measurement scale transformations is utilised to the baseline data of 313 medical dimensions from 518 participants in Pori-75 study in year 2020. A tool for “quick-and-dirty” idea/hypotheses testing is developed by using automatic group mean comparison method together with several automatic statistical test and statistics tools. All processes is programmed with Python by using industry standard, open source and state-of-art libraries including SciPy, NumPy, Pandas, MatPlotLlib, SKLearn, Pingouin and Keras. This allows the code to be integrated to custom environments in the future.
Results : Original 313 data dimensions was augmented to over 1000 dimensions ready to be used with the latest DA and AI tools. The data conversion and transformation processes was programmed so that new data can be easily integrated yearly in automated basis. Python functions was developed for quick data selection and a fast idea/hypotheses testing. Python functions was created for automatic blood result categorisation and medication to ATC-code conversion.
Methods : Data conversions and measurement scale transformations is utilised to the baseline data of 313 medical dimensions from 518 participants in Pori-75 study in year 2020. A tool for “quick-and-dirty” idea/hypotheses testing is developed by using automatic group mean comparison method together with several automatic statistical test and statistics tools. All processes is programmed with Python by using industry standard, open source and state-of-art libraries including SciPy, NumPy, Pandas, MatPlotLlib, SKLearn, Pingouin and Keras. This allows the code to be integrated to custom environments in the future.
Results : Original 313 data dimensions was augmented to over 1000 dimensions ready to be used with the latest DA and AI tools. The data conversion and transformation processes was programmed so that new data can be easily integrated yearly in automated basis. Python functions was developed for quick data selection and a fast idea/hypotheses testing. Python functions was created for automatic blood result categorisation and medication to ATC-code conversion.
Kokoelmat
Samankaltainen aineisto
Näytetään aineisto, joilla on samankaltaisia nimekkeitä, tekijöitä tai asiasanoja.
-
Data Strategy Handbook as Guide Towards Data-Driven Organization
Piippola, Timo-Joel (2024)The need for an organizational data culture is evident in the digital era. More organizations are making data-driven decisions, viewing data as a crucial business asset. This thesis aimed to help a case company enhance its ... -
Big datan käyttö liiketoiminnan ennustamiseen: tieliikenneonnettomuudet Suomessa
Alto, Olga (2019)Tämän opinnäytetyön tarkoituksena on selvittää, mitä tietoja voidaan ennustaa suurista tietomääristä. Aineistona on käytetty Suomessa liikennetapaturmia koskevia avoimia lähteitä vuosilta 2015 – 2017. Työssä ennustetaan ... -
Recognizing the value of data in business operations : Data analytics for business operation
Duma, Don (2022)The aim of this study was to demonstrate the hidden value of data that can be extracted with few commercial and open-source software tools. Any given business can collect, organize, and extract data for analysis that can ...