Skip to content

ds-wook/uplus-ai-ground

Repository files navigation

2022 Uplus AI Ground

Code style: black
2022 U+ AI Ground LG Uplus is a competitive 3th solution that recommends content that customers expect to prefer using children's national service data.

Setting

  • CPU: i7-11799K core 8
  • RAM: 32GB
  • GPU: NVIDIA GeForce RTX 3090 Ti

Learner Architecture

The model was improved by combining various features and layers. The model picture is shown below.

Learner Architecture

Model Architecture

We envisioned a model based on the random forest method of bagging. Model Architecture

Ensemble Strategy

The frequency of the predefined list for each model was sorted by setting the weight value. The illustrations and codes for implementing the code are as follows.

Ensemble Strategy

def customize_blend(
    recommends: pd.Series, top_k: int, weighted: List[float]
) -> List[int]:
    """
    Args:
        recommends: recommend items
        top_k: top k items
        weighted: weighted list
    Returns:
        top k items
    """
    recommended_items = [
        eval(recommends[f"predicted_list{num}"]) for num in range(len(weighted))
    ]
    res = {}

    for weight, items in zip(weighted, recommended_items):
        for n, v in enumerate(items):
            if v in res:
                res[v] += weight / (n + 1)
            else:
                res[v] = weight / (n + 1)

    # Sort dictionary by item weights
    res = list(dict(sorted(res.items(), key=lambda item: -item[1])).keys())

    return res[:top_k]

Model Process

After building an ensemble pipeline using Random Forest's Bagging method, Cross validation training was performed using Group KFold for users under the assumption that similar users have common behavioral characteristics.

Model Process

Seed

  • data train test split: 94, 95, 96, 99, 123, 317, 529, 705, 1234, 3407
  • 5 Group KFold Baseline: 22, 94, 95, 96, 99, 317, 2020, 3407
  • 5 Group KFold My Model: 22, 94, 95, 96, 99, 3407

Requirements

By default, hydra-core==1.2.0 was added to the requirements given by the competition. For pytorch, refer to the link at https://pytorch.org/get-started/previous-versions/ and reinstall it with the right version of pytorch for your environment.

You can install a library where you can run the file by typing:

$ pip install -r requirements.txt

Run code

Code execution for the new model is as follows:

  1. Put the basic data into the input/upplus-recsys/ folder. When you execute the code that creates the data, the data for each fold star and item_features, user_features are stored in the input/upplus-recsys/ folder.

    $ python scripts/make_dataset.py models=neucf
  2. Running the learning code shell allows learning for each fold.

    $ sh scripts/train.sh

    Modifying the learning code shell will allow you to learn for each fold. You can also change the seed value. Examples are as follows.

    for seed in 22 94 95 96 99 3407
    do
        for fold in 0 1 2 3 4
        do
            python src/train.py models.fold=$fold models.seed=$seed
        done
    done
  3. Running the prediction code shell saves the inferred values for each fold in the output folder.

    $ sh scripts/predict.sh

    Modifying the prediction code shell allows inference for each fold. And you need to set the seed value of the learned model. Examples are as follows.

    for seed in 22 94 95 96 99 3407
    do
        for fold in 0 1 2 3 4
        do
            python src/predict.py models.fold=$fold models.seed=94
        done
    done
  4. To ensemble for each fold, modify the config/ensemble.yaml file to ensemble the desired file.

    defaults:
      - _self_
      - data: dataset
      - features: features
      - models: neucf
      - hydra: default
      - override hydra/hydra_logging: disabled
      - override hydra/job_logging: disabled
    
    output:
      path: output
      name: neural-mf-layer3-seed94-group-5fold-ensemble.csv
      submit: sample_submission.csv
      features: features.yaml
    
    ensemble:
      preds1: neural-mf-layer3-seed94-group-fold0.csv
      preds2: neural-mf-layer3-seed94-group-fold1.csv
      preds3: neural-mf-layer3-seed94-group-fold2.csv
      preds4: neural-mf-layer3-seed94-group-fold3.csv
      preds5: neural-mf-layer3-seed94-group-fold4.csv
      weights:
        - 1
        - 1
        - 1
        - 1
        - 1
  5. Ensemble code saves the final result in the output folder.

    $ python src/ensemble.py output.name=neural-mf-layer3-seed94-group-5fold-ensemble.csv

Benchmark

Benchmark

The boosting model has a significant performance difference compared to the NN. Ensemble results also appeared to have a greater impact than other models.

Submit

The file submit in the output folder is the file we finally submitted. best-lb-bootstrap-group-fold-enemble.csv is an ensemble result of existing baseline models and models learned by 5Group KFold. In the case of rank-nural-enemble.csv, it is a result of adding existing baseline models and 5Group KFold models, and 5Fold models.

Doesn't Work

  • Boosting ranker model: Failed to process data to learn rank.
  • Boosting binary model: binary learning took a long time. Not only did the training take a long time, but the model was not able to distinguish properly. This seems to be a problem caused by the lack of feature.
  • Using the Boosting Ranker model after generation of candidates: It seems that it was not distinguished well because of the lack of features.
  • As a result of using 4 layers, the difference in score between CV and LB seems to be overfitting.
  • The Graph model took too long to learn.

Reference

About

🏆2022 Uplus AI Ground competition 3th solution🏆

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors