2022 Uplus AI Ground

2022 U+ AI Ground LG Uplus is a competitive 3th solution that recommends content that customers expect to prefer using children's national service data.

Setting

CPU: i7-11799K core 8
RAM: 32GB
GPU: NVIDIA GeForce RTX 3090 Ti

Learner Architecture

The model was improved by combining various features and layers. The model picture is shown below.

Model Architecture

We envisioned a model based on the random forest method of bagging.

Ensemble Strategy

The frequency of the predefined list for each model was sorted by setting the weight value. The illustrations and codes for implementing the code are as follows.

def customize_blend(
    recommends: pd.Series, top_k: int, weighted: List[float]
) -> List[int]:
    """
    Args:
        recommends: recommend items
        top_k: top k items
        weighted: weighted list
    Returns:
        top k items
    """
    recommended_items = [
        eval(recommends[f"predicted_list{num}"]) for num in range(len(weighted))
    ]
    res = {}

    for weight, items in zip(weighted, recommended_items):
        for n, v in enumerate(items):
            if v in res:
                res[v] += weight / (n + 1)
            else:
                res[v] = weight / (n + 1)

    # Sort dictionary by item weights
    res = list(dict(sorted(res.items(), key=lambda item: -item[1])).keys())

    return res[:top_k]

Model Process

After building an ensemble pipeline using Random Forest's Bagging method, Cross validation training was performed using Group KFold for users under the assumption that similar users have common behavioral characteristics.

Seed

data train test split: 94, 95, 96, 99, 123, 317, 529, 705, 1234, 3407
5 Group KFold Baseline: 22, 94, 95, 96, 99, 317, 2020, 3407
5 Group KFold My Model: 22, 94, 95, 96, 99, 3407

Requirements

By default, hydra-core==1.2.0 was added to the requirements given by the competition. For pytorch, refer to the link at https://pytorch.org/get-started/previous-versions/ and reinstall it with the right version of pytorch for your environment.

You can install a library where you can run the file by typing:

$ pip install -r requirements.txt

Run code

Code execution for the new model is as follows:

Put the basic data into the input/upplus-recsys/ folder. When you execute the code that creates the data, the data for each fold star and item_features, user_features are stored in the input/upplus-recsys/ folder.
```
$ python scripts/make_dataset.py models=neucf
```

Running the learning code shell allows learning for each fold.

$ sh scripts/train.sh

Modifying the learning code shell will allow you to learn for each fold. You can also change the seed value. Examples are as follows.

for seed in 22 94 95 96 99 3407
do
    for fold in 0 1 2 3 4
    do
        python src/train.py models.fold=$fold models.seed=$seed
    done
done

Running the prediction code shell saves the inferred values for each fold in the output folder.
```
$ sh scripts/predict.sh
```
Modifying the prediction code shell allows inference for each fold. And you need to set the seed value of the learned model. Examples are as follows.
```
for seed in 22 94 95 96 99 3407
do
    for fold in 0 1 2 3 4
    do
        python src/predict.py models.fold=$fold models.seed=94
    done
done
```

To ensemble for each fold, modify the config/ensemble.yaml file to ensemble the desired file.

defaults:
  - _self_
  - data: dataset
  - features: features
  - models: neucf
  - hydra: default
  - override hydra/hydra_logging: disabled
  - override hydra/job_logging: disabled

output:
  path: output
  name: neural-mf-layer3-seed94-group-5fold-ensemble.csv
  submit: sample_submission.csv
  features: features.yaml

ensemble:
  preds1: neural-mf-layer3-seed94-group-fold0.csv
  preds2: neural-mf-layer3-seed94-group-fold1.csv
  preds3: neural-mf-layer3-seed94-group-fold2.csv
  preds4: neural-mf-layer3-seed94-group-fold3.csv
  preds5: neural-mf-layer3-seed94-group-fold4.csv
  weights:
    - 1
    - 1
    - 1
    - 1
    - 1

Ensemble code saves the final result in the output folder.

$ python src/ensemble.py output.name=neural-mf-layer3-seed94-group-5fold-ensemble.csv

Benchmark

The boosting model has a significant performance difference compared to the NN. Ensemble results also appeared to have a greater impact than other models.

Submit

The file submit in the output folder is the file we finally submitted. best-lb-bootstrap-group-fold-enemble.csv is an ensemble result of existing baseline models and models learned by 5Group KFold. In the case of rank-nural-enemble.csv, it is a result of adding existing baseline models and 5Group KFold models, and 5Fold models.

Doesn't Work

Boosting ranker model: Failed to process data to learn rank.
Boosting binary model: binary learning took a long time. Not only did the training take a long time, but the model was not able to distinguish properly. This seems to be a problem caused by the lack of feature.
Using the Boosting Ranker model after generation of candidates: It seems that it was not distinguished well because of the lack of features.
As a result of using 4 layers, the difference in score between CV and LB seems to be overfitting.
The Graph model took too long to learn.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
config		config
input/uplus-recsys		input/uplus-recsys
output		output
res/models		res/models
scripts		scripts
src		src
.flake8		.flake8
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

2022 Uplus AI Ground

Setting

Learner Architecture

Model Architecture

Ensemble Strategy

Model Process

Seed

Requirements

Run code

Benchmark

Submit

Doesn't Work

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

2022 Uplus AI Ground

Setting

Learner Architecture

Model Architecture

Ensemble Strategy

Model Process

Seed

Requirements

Run code

Benchmark

Submit

Doesn't Work

Reference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages